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Hardware and Software Installation Notes 



This chapter describes how to instMipie Nintendo 64 development board 
into a Silicon Graphics Indy workstation. It also describes how to install the 
Nintendo 6|Me^lppment software and where the software components are. 
located |§'' ll^ip*. 

This chapter is not a complete installation guide. You must be famihar with 
the standard SGI softwaj|§installation procedures and GIO board 
installation in an In&yi workstation. 
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Hardware Installation 



The Nintendo 64 Development Board is installed in the Iii^|:pPbrkstation as 
described m the Indy Workstation Ou>mm y Guide (see the chapter "Installing 
the GIO Option Board"). The following instructions supplement that 
chapter and serve as an errata. Fi||§re 1-1 Jp^^jtfee placement of the 
Nintendo 64 Development board lia^e^lipy worf||ation. 

The board is secured in the workstation B||i| : Q;u&$§rews that attach it to the 
standoffs on the base boareL When you install me board, be careful not to 
damage any jumper wires ;t|if|:;may be present on the board. 

The Nintendo 64 Development boaffl||s,,not supported by the hinv 
command. Once the board and software 'have been successfully installed, 
the boot monitor^ will echo "U64 Device found" during the power- up 
procedure. Ip^aj^f^Gation ginv in /usr/ scr/PR/ginv can be used to print 
information::|oout m@;|ns|a;|led development board such as the RCP version 
number, cloeSspeed^and video mode. 

Figure 1-1 Nintendo 64 GlO^ard 



game controller 
P° rtS AVout 

mm 
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The AV out port connector type is the same a^thaj. u§lcl on the current Super 
Nintendo Entertainment System. The cable that Connects thlts port to an 
external television can be obtained from most stores thaisell the SNES 
device. You can buy different cables ^support Composite, S- Video RGB, or 
other formats that are standard in your country. 

Note that the AV out can optic^^y o .fegi ; routed^|fck to the Indy video input 
and audio inputs, allowing you to view and hear the gameboard on the local 
Indy workstation. The workstation accepts coitiposite or S- video input as 
provided on separate SNES cables. 

The game controller pqfrs af dgpt RJ-11 connectors (available on the U64 
Development game corirroUers'fS©%ded by Nintendo). There are connectors 
for six ports, though only connectors^ through 4 are active. The connectors 
are named 1 Jjp^ough 6, and are numbered from left to right (when you view 
the connegtpffoin the back of the workstation). Plugging a controller into 
port 5 w||fcause t£ie.,machine to hang. 
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Software Installation 



The Nintendo 64 development software image is not the^iil^ software 
required for development. Your Indpi^o/rkstation must also contain the 
following 5.3 products: 

• dev 

• c_dev Jf ; 

• compiler_dev ^ r -::i|_|||M s ' 

• gl_dev 

• Case Vision, version 2.4 ;; '^lb;>,. 

• Workshop, version 2.4 '**' 

Three produc^Sre' li^ndled with the Nintendo 64 development software: 

• GameSrtli? >ff : ' :y!] '"' ; Wk- 

• ultra 

• dmedia_eoe (versionjpy 

|^pfe:-vCasevision and Workshop need to be installed before Gameshop. 
iffrorksftlp needs to be version 2.4 or earlier. 

READMEs and Release Notes 

After installation of Nintendo 64 development software, You will find a 
g|p|lection of sample demonstration applications in /usr/src/PR. A 
K£ffpME_DEMOS file which describes each applications key features. You 
willfllso find the release notes in /usr/src/PR/relnotes. The release notes 
siyppierizes the differences from the last release and various bugs, 
;;vp*fkarounds and caveats of the system. 

Other Sources 

In /usr/src/PR/assets, you will find the source hies for building the general 
MIDI bank. We created an initial complete general MIDI bank for testing 
purposes. For a game, we assume that you will gut the bank down to 
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including only those instrument and soimds'fct^ofi^eed.^eiefore, this 
directory gives you a starting point to do that. '""""" ^ , 



In /usr/src/PR/libultra, you wU|i|n^spme pieces of the Nintendo 64 
system library code (ubultra.a^fhese a|<2 supplied to give a starting point 
on writing your own custom v&ions o|mesBs ; tib components. However, 
these sources require extensive -SiSLsQpce treefcild environment tools to 
actually build. Therefore, only the nor||>uildab||isources are shipped 
currently 



Executables w 

The first piece of software you will need to use is gload. This program 
downloads J||§$1QM image onto the Nintendo 64 development board and 
starts exectilon/Sion after, you will need to use dbgif and gvd to debug your 
P ro gr am| ; ; ; }. 

• /usr/soin/glbad J|| 

• 5 /usr/ sbin/ dbgi£^ i: p? ; : 
; * /usr/sbin/gvd 

Thefeare also conversion tools that help in converting data into Nintendo 64 
format. For example, fltZc convertss a MultiGen database into a C data 
stmctufithat can be compiled into binary form. Most of these tools reside in 
/usr/sbmfi|ut some are supplied in source form in /usr/src/PR/conv. 
Keep in mind that these are templates for your own custom database 
conversion tools. We can not possibly address the need of all developers. 
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Cimpter 2 s fe 

Troubleshooting Software Bringup 1 



This chapter describes common problems that you might encounter when 
you start brjigBg up your Nintendo 64 software. The potential problem 
areas are.:;Kp ;;i; ""^;§^ 

• operating sysJem : "-'® : :|K 

• graphics 0: 

• audio -0^>0' y 
',. • ..-.. integration 



Operating System 

Game locH up immediately. 

A common error is to start the rmon thread at the same priority as the 
Impawning thread. Rmon then immediately goes to sleep and locks up the 
■|)|stem. The recommended way for starting the system is to create an idle 
Jpread in the boot procedure at a high priority. From the idle thread start all 
ifhe other application threads, then lower the priority to zero and loop 

forever to become the idle thread. Note that the rmon thread is not needed 

for printfs. See the osSyncPrintf (3P) man page. 

Game encounters a CPU exception. 

During the development of your game, you may (intentionally or 
unintentionally) encounter various CPU exceptions (or faults) such as TLB 
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miss, address error, or divide-by-zero. Curcenti^feesysf |m fault handler 
saves the context of the faulted thread, stops the faulted trueadjjrom 
execution, sends a message to any thread registered for ti||M?; 
OS„EVENT_FAULT event, and dispatcfees-the next runnable thread from the 
system run queue. If rmon is runnptg, it Wesu Id register for the 
OS_EVENT_FAULT event, receivsg |fie mess&p: : t||rn the exception handler, 
stop all user threads (except the id|e5|fu;eafi), and s;^nd the faulted thread 
context to the host. If gload is rurmihg : orij;ihe host,.|f would receive the 
faulted thread context and print its content©, thf icreen. If gvd is running 
on the host, it would receivjfithe fault notification and point you to where the 
fault occurred. If rmon is nolriltaing on the target, you probably experience 
a strange behavior (i.e. hang) m'j^ui-.game since the faulted thread can no 
longer run. '^M:^. 

If you want to catch the OS_EVENT_FAULT event (instead of using rmon), 
you can use twp'rrvte-jrltel OS functions to find the faulted thread and handle 

the exceptiori;^oursel|||||e| ; are osGetCurrFaultedThread (3P) and 

osGetNextfyultedTfcread (Sff^Please refer to their man pages for more 

information. "^ 



Graphics 

pThere Is no picture on the screen, but the drawing loop is running. 

You are probably handing a bad segment address to the RSP graphics 
pipeline. This problem is easy to overlook, as there are no warnings. Make 
sure you thoroughly understand how a MIPS family processor performs 
l^dressing and how KSEGO works (most games run in KSEGO). It allows 
"M§te d access with no TLB translation. All CPU registers are accessible. 
KSl|f| addresses use the most significant bits of the address to indicate the 
addjjpssing modes. 

Spre 2-1 CPU KSEGO-3 Addresses 
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The RSP uses a segment addressing scheme wi^,bap pointers. It is very 

easy to hand a CPU KSEGO address to the RSP bf : rnistake : arid spend hours 
locating a simple error. Note that KSEGO CPU address would reference a 
invalid segment if decoded as an.R^; address. 

Figure 2-2 RSP Addresses 
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RSP 

segmeno 



RSP ! 



For example, if you have the following code, the RSP/RDP pipeline will 
receive garbage: % ' 33 ' 

Ktx ma|p|^ : 

gSPM|pS:ix (l|l|t+ A ^matrix, G_MTX_ ) ; 



matrix il'f|KSEGt) CPU address 0x8xxxxxxx. When this is handed to RSP, 
it fetches garbage. Below;Jff a list of common commands with pointers: 

•f::.;-. gDPSetColorlllijpP : 
;$»' ^IgDPSetTexturelmage 

• 'UpPSetMasklmage 

• g§|||atrix 

• gSP Viewport 

• gSPVertex 

|,» gSPDisplayList 

J!j|ep in mind that CPU addresses and RSP/RDP addresses uses different 
padressing schemes and are not interchangeable. 

One useful way to debug possible, display list problems is to link with the 
GBI dumping routines in Hbgu, and print out the display list. This will 
immediately show bad pointers and garbage matrices. See the man page for 
guParseGbiDL (3P) and guParseRdpDL (3P). 
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Ending a Display List '^|| 

Make sure that your recent gbi display edit has gSPEndDispl.ayList in each 
display list. Without this, the RSP will. probably hang. The RDP requires a 
gDPFullSync at the end of the enfe;ils : j5lay list sequence to make the DP 
interrupt the CPU for notification ;ff W^mMk* 

Flaky Video 

The beginning of the framftbuffer and z-bulSiiiiaaresses must be 64 byte 
aligned. 



Audio "^ 

Alignment I§sties , ''" i %|^. 

The audio sfitern shares se\#al data structures between the 4300 and the 
RCP. In order w avoid aHgnmiffit problems, any buffer used by both the 4300 
and the RCP should be^.allgcated using the alHeapAllocQ routine. This will 
genfjate buffers witHT6-l>yte alignment, avoiding all alignment issues as 
weiPascache tearing issues. 

pize ancpNumber of buffers 

A commohf^rror is to run out of buffers, particularly DMA buffers. Because 
the number of Buffers needed is largely dependent on the music and sound 
effects used, it is not possible to provide guidelines. As music and sound 
.^effect complexity increases, the number of buffers needed will increase. 

AM||o Pops and Clicks 

To. avoid audio pops and clicks, all samples should start with at least one 
ifalue of zero. Upon receiving a pre-nmi message it is important that the 
audio fade to zero output, or on subsequent bootup, there is a potential for 
a pop. If audio does not run at a high enough priority, the audio may not be 
generated before the previous buffer has completed. If this occurs there will 
be a period where no samples are played. This will usually generate a clear 
pop. 
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Integration 



DMA Alignment *«*§£- 

All DMA transactions in the N|ntendo : f| must use 64 bit aligned for data in 
RDRAM. DMA transactions f|j|iata mROMfnust use 16 bit aligned 
address es . 'W^&iW 



Debugging CPU Faults 

The "gdis" disassemblfr is a'plferful debugging aide that can help you 
turn a cryptic crash dump (i.e the 'text -that is printed in your gload window 
when your program takes an exception) into useful debugging information. 

For example, you$ an disassemble the section named "code" (as specified in 
the speclfle) in t^^enrome'' example application executable as follows: 

% gdis -S -t . cod<|vtext letters 
Mere is a portion of : thfe : output ... 



4 1443 


Ox 


80200050: 27 bd 


ff 90 


addiu sp,sp,-112 


1 144] 


Ox 


80200054: af bf 


00 lc 


sw ra, 28 (sp) 


fe.45: 




int i , *pr ; 






'%46 : 




char *ap; 






ife: 




u32 *argp; 






148,: 




u32 argbuf[16] 






149 5 -- 










150: 




/* notice that 


you can 


t call rmonPrintf ( ) 


until 


you 


set 






151: 




* up the rmon 


thread. 




152: 




*/ 






153: 










154: 




oslnitialize { ) 







[ 154] 0x80200058: 0c 08 04 c4 jal 

oslnitialize 

[ 154] 0x8020005c: 00 00 00 00 nop 

155: 

156: argp = (u32 *)RAMROM_APP„WRITE_ADDR; 

[ 156] 0x80200060: 3c Oe 00 ff lui t6,0xff 

[ 1563 0x80200064: 35 ce bO 00 ori 

t6, t6,0xb000 
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[ 156] 


0x80200068 


af 


ae 00 


60 




■f|6,96<spJ 


157: 


for (i 


=0; i<sizeo 


E(arc 


jbuf ) s ^4-§Ssi s ++ , 


a.£$p++) { 


[ 157] 


0x8020006c 


af 


a0 00 


6c 


sw 




zero, 108 (sp) 












158: 


osPiRawReadlo { (u3^'4^-1i#g-/> 


kargbuf [i] ) ; 


/* Assume 


no DMA 


*/ 












[ 158] 


0x80200070 


8f 


af^ 00 


Q f"?-:P:! 


mm Xm w 


t7,108(sp) 


[ 158] 


0x80200074 


8f 


amm 




m 


a0,96{sp) 


[ 158] 


0x80200078 


27 


b9 O'O 




jpddiu 


t9,sp,32 


[ 158] 


0x8020007c 


00 


Of cO 


8||i 


:;:«stlf : sll 


t8,t7,2 


[ 158] 


0x80200080 




08 05 


4c % 


sx<*- jal 




osPiRawReadlo 












[ 158] 


0x80200084 




iien? 8 


21 


addu 


al,c8,c9 


[ 157] 


0x80200088 


w 8f 


a§-:|p. 


,6 c 


Iw 


t0,108(sp) 


[ 157] 


0x8020008c 


8f 


aa 00 


4:0 


lw 


t2, 96 (sp) 


[ 157] 


0x80200090 


25 


09 00 


01 


addiu 


tl,t0,l 


[ 157] 


0xf ; Q:||||Q94 


2d 


21 00 


10 


sltiu 


at,tl,16 


[ 157] 


; :gp02Ci|98 ; 


25 


4b 00 


04 


addiu 


t3, t2,4 


C 157] 


ix802C|||I ; 


S|af 


ab 00 


60 


sw 


t3,96(sp) 


[ 157] 


: §C;S02di6a0 


' : 'll 


20 ff 


£3 


brie 




at, zero/!x8020007( 


) IP 










[ 157] 
$% 159 : 


0x802 CO Qa4. 
} 


,ri:Sf 


a9 00 


6c 


sw 


tl,108(sp) 



Notice tha s t;fte C source is interleaved with the disassembled code, and that 
the PC is givejtin the second column. 

When your program crashes, you can look up the error PC listed in the crash 
itgmp (it is identified as "epc") to determine where the program crashed and 
fmtfjthe corresponding line in the source/ disassembly listing. 
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NINTENDO DRAFT HARDWARE ARCHITECTURE 

Chapter 3 '' 4 %^ hy ^W 

Hardware Architecture €i» 



This chapter describes the hardware architecture of the Nintendo 64 game 
machine, in Qxgjler to help you write software for the machine. Later sections 
of this mantiaT'tlescribe the details you need to know to program each 

c omp oneh t 

The Nintefflo 64 game ccjjflists of a number of hardware components that 
work together to producg?ihe graphics and audio for the game. The heart of 
Jh-e system is the Rea#y Coprocessor (RCP). Attached to the RCP are 
lliemory chips, the MIPS R4300 CPU, and some miscellaneous I/O chips. 

TheljIpP is the center of the game; all data must pass through it. It acts as the 

memory controller for the CPU. The RCP runs the graphics and audio 
microcode. The display portion of the RCP renders into the graphics 
framebuffir ; located in main memory. The video and audio portions of the 
RCP, DMA*rramebuffer, and audio data from main memory to drive the 
video and audio DACs. Figure 3-1, "Nintendo 64 Hardware Block 
agram," on page 42 is a block diagram of the Nintendo 64 system. 



■pi; 
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Figure 3-1 Nintendo 64 Hardware Block Diagram 



R430C1CPU 



Game Cartridge 



Cartridge 
Interface 



PBUS 



I 



MBUS 



:pif 



Reality CoProcessor 
(RCP) 



Game Controllers.: 




Execution Overview 

The CPU and RCP are both processors that can execute at the same time. 
Threads execute on the CPU and tasks execute on the RCP. Accesses to main 
memory from threads and tasks also occur in parallel. 

The game program runs on the R4300 CPU as a collection of threads, each of 
which has its own stack. The operating system is a collection of routines that 
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can be called in a thread. The operating sysfe#:gonttCtts which thread is 
running on the CPU. A thread can access all of physical memory. See 
Chapter 6, "Operating System Overview," for more inlprrhation. 

Tasks run on the RCP, which is a;m : icrd£Q/de engine that processes a. task list. 
Task lists are generated by a th^gd runn|n^'0:^;%e R4300 CPU and are stored 
m main memory The game program creates the task list, calls an OS routine 
to load the appropriate microcode, and then staffs the RCP running to 
process the task list. The microcode on the RCP reads the task list from mam 
memory. The RCP task:;faan also write into mam memory. 



RCP: Reality Coprocessor 

The RCP is really ; a collection of processors, memory interfaces, and control 
logic. The,0'A^tf%gnal Processor (RSP) is the microcode engine that 
executes ahdio and graphics tasks. The Reality Display Processor (RDP) is 
the graphicsdisplay pipeline that renders into the framebuff er. The memory 
interfaces provide access Jjf main memory for the CPU, RSP, RDP, video 
interface, audio mterfacs^peripherial devices, and serial game controllers. It 
is very important tOTifflember that these interfaces may be active at the 
i^arrig time and that the RSP and RDP are running in parallel. 
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Figure 3-2 Block Diagram of the RCP 
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RSP: Reality Signal Processor 

Trtf ;R£P is the processor used by the graphics and audio microcode. The RSP 
corpjbts of a Scalar Unit (SU), a Vector Unit (VU), instruction memory 
^(JlpiM), and data memory (DMEM). The microcode is fetched from EMEM 
l&ttl has direct access to DMEM. The RSP can also access main memory using 
DMA. All memory references in the RSP are physical. However, the 
microcode uses a segment address table to translate segmented addresses 
provided in the task lists into physical addresses. The PMEM and DMEM are 
both 4 KB. The SU implements a subset of the R4000 instruction set. The VU 
has eight 16-bit elements. 
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For information on how the RSP is used to n^lemenl partiof the graphics 
pipeline, see Chapter 12, "RSP Graphics Progr1m1mmg" v ||l apter 19, "The 
Audio Library;" describes how the RSP is used in auilHfifocessirig 



RDP: Reality Display Processor 

The RDP is the graphics display pipeline that ; |pecutes an RDP display list 
generated by the RSP and CPU. The RDP consists of a Rastenzer (RS), a 
Texture Unit (TX), 4 Kf|||f f texture memory ffMEM), a Texture Filter Unit 
(TF), a Color Combmer. ;p^| /: a Blender (BL), and a Memory Interface (MI). 

The RS rasterizes triangles and rectangles. The TX samples textures loaded 
in TMEM. The TF filters the texture Samples. The CC combines and 
interpolates. between two colors. The BL blends the resulting pixels with 
pixels m theTramsfeuffer and performs z-buffer and anitaliasing operations. 
The MI n&formSilrpe;rsi|;Gi, modify, and write operations for the individual 
pixels at 'either ofie pixeLper clock or one pixel for every two clocks. The MI 
also has special modes fojlbading the TMEM, filling rectangles (fast clears), 
and copying multiple, .pixels from the TMEM into the framebuffer (sprites). 

.ifn't-RDP accesses main memory using physical addresses to load the 
! intifnal TMEM, to read the framebuffer for blending, to read the z-buffer for 
deplHiComparison, and to write the z and framebuffers. The microcode on 
the Retranslates the segmented addresses in the task list into physical 
addresses. 

The global state registers are used by all stages of the pipeline. There are a 

number of sync commands to provide synchronization. For example, a pipe 

fpync is used before changing one of the rendering modes. This ensures that 

Hd previous rendering affected by the mode change occurs before the mode 

Jpange. 

The command list for the RDP usually comes directly from the RSP. 
However, it is possible to feed the RDP pipeline from a command list that 
has been stored in main memory. 

See Chapter 13, "RDP Programming/' for more information on the RDP. 
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Video Interface ; \&^,^# 

The video interface reads the data out of the framebuffer E j&$$Im memory 
and generates the composite, S~video, e ^4RGB signals. Ire video interface 
also performs the second pass of the|afttla'li|s algorithm. The video interface 
works in either NTSC or PAL rnoG§§ and can display 15- or 24-bit color 
pixels, with or without filtering, at both high and low resolutions. The video 
interface can also scale up a smaller image to fill the screen. For more 
information on how to set one of the 28 video modes and control the special 
features, see the man page§|gr os ViSetMode tSfPChapter 8, "Input/ Output 
Functionality" also contairl|||||grrnation on the video interface. 

Audio Interface 

The audio interface, reads audio data out of main memory and generates the 
stereo audio signal/ '%e Chapter 19, "The Audio Library" and Chapter 8, 
"mput/Outjjft Funcf$h||ty" for more information. 

Parallel Interface 

Thefptrallel interface is the' DMA engine that connects to the ROM cartridge. 

TWPrManager thread is used to set up the actual DMA commands for all 
lilher tlreads. See Chapter 8, "Input/ Output Functionality" for the list of 
$PI functions. 

Serial Interface 

The serial interface connects the RCP with the game controllers through the 
f|Pjf|chip. To get the current state of the controllers, the application must send 
a command to query all the game controllers. The data will be available 
later. See Chapter 8, "Input/Output Functionality" for a list of all the 
coiifbller functions. 



R4300 CPU 

The R4300 CPU is part of the MIPS R4000 family of processors. The R4300 

consists of an execution unit with a 64-bit register file for integer and 
floating-point operations, a 16 KB instruction cache, an 8 KB writeback data 
cache, and a 32-entry TLB for virtual-to-physical address calculation. The 
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Nintendo 64 game runs m kernel mode with ^h^^iiiressing. 64-bit integer 
operations are available in this mode. Howeverftne 32-bi^pcalling 
convention is used to maximize performace. «11 

For more information on the R|it)0 anW^the operating system control ot the 
CPU see the MIPS Microproceijljjr R4OO$0sMl$$tfanual and Chapter 6, 
" Op eratingSystem CH'ervie w % \ ^ 



Memory Issues % 5 .,:,. -^a^' 

The main memory in the system-is used in parallel by the R4300 CPU, the 
RSP microcode engine, the RDP graphics pipeline, and the other I/O 
interfaces of the RCP. The software irresponsible for defining the memory 
map. See Chapter 9, "Basic Memory Management" for more details. 

Addressing 

The R4300 CPU can use gjf^sical or virtual addresses. The TLB maps virtual 
addresses into physigal'addresses. It is anticipated that programs will 

i:ir|ainly use KSEGO (cached, unmapped) addresses for instructions and data. 

iTftg^RSP hardware uses physical addresses. The microcode imposes a 
segmented addressing scheme to generate the physical addresses. Bits 24 
throiijgjik 27 of the segmented address are used to index into a 16-entry table 
to obta|§^the base address of the segment. The upper 4 bits are masked off. 
The low^j^its are an offset into the segment. This scheme is used to create 
dynamic ISP task lists easily. The RDP hardware uses physical addresses. 
The RSP microcode translates the segmented addresses stored in the task list 
into physical addresses. The segment table in the RSP is initialized to all 

lieros. Every segment initially references memory starting at zero. 

lata Cache 

The R4300 CPU has an 8 KB writeback data cache. This means that when the 
CPU writes a variable, it may not be written to main memory until later. 
Since the RSP reads the task list directly from main memory, the dynamic 
portion of the task list must be flushed from the data cache before the RSP 
starts. 
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Take care in DMA operations also. The data buff'er;inust :! bsr flushed from the 
cache before the write from, memory occurs. The da&if&Ifer mi|it be 
invalidated in the cache before a read into memory occurs. If the cache 
invalidate does not occur, a writeback fro ; m ..the cache may destroy data that 
has just been transfered into main meM6ff- : $% a read DMA. It is also a good 
idea to align I/O buffers on the 16-Jjjfrte dat%;gSi^;^rrie size, to avoid cache 
line tearing. Tearing occurs when a'f^ffer^jind a uhfpjjated variable share a 
cache line. The potential writeback dFlne'^lriable cefeld destroy data read 
into the 1/ O buffer. t| s v .. ,M 

Alignment 

Note the various alignment restriction!^.;., 

• 8 byte alignment for most DMA 

• 8 byte ahgr^merit ; lbr main memory, 2 byte alignement in ROM for PI 

• 64 byte a||rrnient;pi :i eoig|- ; : : framebuffers (cfb) and z-buffer 

• 8 byte alignment for textu||s 

Clock Speeds and Bus Bandwidth 

fpfrious s^ptem statistics and bandwidths: 

• CPU -94:^ Mhz 

• RDRAM -liO Mhz (9 bit bytes at 500 M/sec) 

• RCP - 62.6 Mhz 

; '* i| :M - variable, 3000-368000hz on NTSC, 3050-376000 on PAL 

• Wr (depends on mode; NTSC. PAL, MPAL 
*^0W-5O Meg/ sec peak, 5 Meg/sec from typical slow ROMs 

• SI - really slow \~\ \\ \Vt\ KpO , YlpvA 1 ^ ...;<■-■ 



Development Hardware 

The development system consists of an Nintendo 64 game card on a GIO 
card for the Indy workstation. The ROM cartridge is replaced by 16 
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megabytes of RAM, called the ramrom, tha?l%cce ; #Ile frojn both the Indy 
workstation over the GIO bus and the RCF over the PBUS. ; fhe workstation 
downloads the game software onto the GIO card anrflflftlhe Nintendo 64 
executes the game. The rairu-orn : &:^o,used to pass information by the 
debugger. The 4 Megabytes oi : mam rr^mqry uses the 9 bit RDRAMs. The 
color and framebuffers can bfjjlaced ; a*i^&||e in memory. 



Figure 3-3 Developm|rit System 
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Chapter 4 : 1|; -. < 

Runtime Software Architecture 



This chapter describes the runtime Tflhtendo 64 software architecture. It is 
intended as^brief tour of the overall architecture and discusses the basic 
design g\p3eTffie%/More specific details are provided in subsequent 
chapters. 

This chapter briefly coversjthe following topics: 

• CPU: threads, rnessates, interrupts, cache coherency, tlbs 
;■*%. IO: device library, device manager 

• f^emory: static allocation, region library 

• Rc£: tasks, command lists, yielding 

• Grajp||cs: graphics interface 

• Audio: sequencer, audio player, driver, wavetable synthesis 

• Application: typical application framework 

p|| Debugger: debugger support for CPU and RSP 
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Resource Access and Management 



The Nintendo 64 game machine is made, up of a variety of resources. These 
resources include the CPU, memoiy^m'emi^ bus bandwidth, IO devices, 
the RSP, the RDP, and peripheral djjfices. 'the software is designed to 
provide raw access to all of the res|| : rces.,fhe ! sb ( fr||are layer basically 
translates logical functions and arguments into exacthardware register 
settings. W: h> 

Management of most resou|i§sis left up to the game itself. Resources such 
as processor access and mekprv^sjge are too precious to waste by using 
some general managemenf%lgoritrl6i||||at is not tailored to a particular 
game's requirement. The only management layers provided are the audio 
playback and I/O device access. 

The audio playback mechanism is fairly consistent from game to game. Only 
the sounds fimseh#: ;; are : liferent. Therefore, a general tool to stream 
audio playbalfcs useful. Theflf O devices can be managed to provide 
simultaneous multiple accesjfontexts for different threads. For example, 
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streaming audio data and paging in graphics S|ta|^Might;^quire sharing 
access to the ROM. st . : ., ^jf : 

Figure 4-1 Application Resources ^^^^ 
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CPU Access 



Message Passing Priority Scheduled Threads 

To provide access to CPU computtf;ycles^: :: piIicon>|lraphics provides a 
simple CPU scheduler to help the gahae "rm&nage multiple threads of control. 
These are the attributes of this schedulin^:;scheine:f; : ;? 

• Non-preemptive exeafepn: The currently running thread will continue 
to run on the CPU untilp§y : ishes to yield. Preemption does occur if 
there is a need to service another, higher-priority thread awakened by 
an interrupt event. The interrupisgryice thread must not consume 
extensive CPU cycles. In other worSf, preemption is only caused by 
mtermpts. jgfggmption can also occur explicitly with a yield, or 
impHcitly; : pnne%aiting to receive a message. 

• Priority -||hedulifl|" A" slrnple numerical priority determines which 
thread ruf# when a currently executing thread yields or an interrupt 
causes rescheduling. 

• .Jijfessage passing: Thretds communicate with each other through 
; ;||rn%sages. One thread writes a message into a queue for another thread 

. . . to retrieve. 

; '• Interrupt messages: An application can associate a message to a 
parncligr thread with an interrupt. 



CPU Data Cache 

The R4300 has a write back data cache to improve CPU performance. That 
meaMs that when the CPU reads data, the cache may satisfy the read request 
eliminating the extra cycles needed to access main memory. When the CPU 
!;#rftes data, the data is written to the cache first and then flushed to main 
memory at some point in the future. Therefore, when CPU modifies data for 
the RCP's or IO DMA engine's consumption via memory, the software must 
perform explicit cache flushing. The application can choose to flush the 
entire cache or just a particular memory segment. If the cache is not flushed, 
the RCP or DMA may get stale data from main memory. 
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Before the RCP or IO DMA engines produce %fa iox Jp CP|5 to process, the 
internal CPU caches must be explicitly mvalidate#fou don't want the CPU 
to be examining old stale data that is in the cache. The jijpalidation must 
occur before the RCP or DMA enjgpgglace the data in main memory. 
Otherwise, there is a chance that a writeback of data in the cache will clobber 
the new data in main memoryf | ^M0i$: L 

Since the software is responsible for c|che coherency, keepmg data regions 
on cache line boundaries is a good idea- A .single cacheline containing 
multiple data produced^by multiple proiiei§Srs can be difficult to keep 
coherent. 



No Default Memory Management 

As showtfabove, the Nmtendo 64 operating system provides 
multi-thiladed message-passing execution control. The operating system 
does not impose a default memory management model. It does provide a 
generic Trlnslation LoolJIide Buffer (TLB) access. The application can use 
the TLB to provide for ; a^riety of operations such as virtual contiguous 

; memory or memory protection. For example, an application can use TLBs to 

Jprotect against stack overflows. 



Timers 

Simple tiller facilities are provided, useful for performance profiling, 
real-time scheduling, or game timing. See the man page for osGetTime (3P) 
for more information. 



jfariable TLB Page Sizes 

The R4300 also has variable translation lookaside buffer (TLB) page size 
capability. This can provide additional, useful functionality such as the 
"poorman's two-way set-associative cache," because the data cache is 8 KB 
of direct-mapped memory and TLB pages size can be set to 4 KB. The 
application can roll a 4 KB cache window through a contiguous chunk of 
memory without wiping out the other 4 KB in cache. 
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MIPSCoprocesserOAccess 

A set of application programming interfaces (APIs) are also, provided for 
coprocessor register access, including; CPU cycle accurate timer, cause of 
exception, and status. : ; : ;:.,''' : ' : ' , ^fl| 

I/O Access and Management : %^ ; , X :d& y 

The I/O subsystem provides functional asSeess tcKpe individual I/O 
hardware subcomponentsliybst functions '"pMvicle for logical translation to 
raw physical access to the |||||device. 

Figure 4-2 I/O Access an c£M an a patent Software Components 



audio DA(f~b:< video DAC controllers penpherals i^ROMj 



Pi Manager 

Nintendo 64 also provides a peripheral interface (PI) device manager for 
! multiple threads to access the peripheral device. For example, the audio 
thread may want to page in the next set of audio samples, while the graphics 
thread needs to page in a future database. The PI manager is a thread that 
waits for commands to be placed in a message queue. At the completion of 
the command, a message is sent to the thread that requested the DMA. 
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VI Manager 

A simple video interface (VI) device manager keeps tfacf of when vertical 
retrace and graphics rendering is complete. It also updates the proper video 
modes for the new video field. The VI manager can send a message to the 
game application on a vertical retrace. The game can use this to synchronize 
rendering the next frame. "'-^0^'f 
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Memory Management 



Uo Default Dynamic Memory Allocation 

The Nintendo 64 software does no|;%nposftf; : memi|y map on the game. The 
Nmtendo 64 system leaves the rnehltd^: location problem up to the game 
application. It assumes that the applicant^ knows;:|he memory partitioning 
scheme most suitable for the particular ga^e^lfpvever, the Nmtendo 64 
library does have a heap H||$ry that is availaKe. 



Region Library 

The Nintendo 6Jj ; s|^ern does provide a region allocation library that can 
partition a m|prTory fe|rion specified by the application mto a number of 
fixed-sized flocks. Thl| ; : : giv;ef ■.the application the capability of using a 
dynamic melj||ry all6catiorii|herne. However, the game application must 
be able to handle the case when memory in the region has run out. 



Memory Buffer Placement 

fthere aii||pme optimizations on the placement of memory buffers. For 
example, lifs best to keep the color and depth buffers on separate 1 MB 
memory bahf|,,The RDRAM has an active page register for each megabyte. 
Spliting the color and z-buffers into seperate megabytes, prevents the 
memory system from constantly having to change the page register. This 
technique minimizes page misses. 



Mejtory Alignment 

The DMA engines responsible for shuffling data around in the hardware all 
require the 64-bit aligned source address, the destination address, and 
lengths. Addresses in ROM do not have this 64 bit alignment restriction. 
ROM addresses only need to be 16-bit aligned. The loader from the compiler 
suite (see the man page for Id (1)) makes sure that all C-language long 
long types are 64-bit aligned. 
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Using C language, the stack for a thread mt^fc^lsQ^' 64-tyt aligned. 
Therefore, all stacks should be defined as 1 ong" fbhg and t^pe-casted when 
calling osCreateThread. See the man page for more details,^' 
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RCP Access and Management 



The CPU has control over access to the RCP. The RSP ari^fPP portions of 
the RCP can be used individually, or as a group. The CPU creates a task list 
that specifies what microcode to run and what command list to execute. The 
task is then run on the RSP There are OS commands to start the task and to 
yield (ie preempt) a task. The RDP usually receives graphics rendering 
commands directly from the RSP. However, it is also possible to drive the 
RDP from a list that is in DRAM. fft ; , .^;§r 
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Graphics Interface 



Nintendo 64 uses a display list hierarchy to describe what to render. 3D 
geometry transformation and rasterization are accelerated by RSP and RDP 
respectively There is no immediate mMe rendering. The R4300 CPU 
generates the display list in melhory, t|en the RCP fetches the display Us t 
and renders the graphics. ''^$g§B 



Graphics Binary Interface 

Nintendo 64 renders JIphi8s using a display list interface called graphics 
binary interface (GBI). The CPUll||mbles the GBI structure in RDRAM for 
the RSP /RDP to render. The RSP must first be downloaded with graphics 
microcode .to perform geometry transformation. The RDP performs polygon 
rasterization. RSfeanq! RDP state machines are described in more detail in 
Chapterf 1, "RSP Graphics Programming" and Chapter 13, "RDP 
PrograrrSpng" '-^ 
Figure 4-3 Graphics Piglpne 
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°'0||.assembly 




RSP 

3D geometry 
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ISBI Geometry and Attribute Hierarchy 

The GBI structure describes a hierarchy of geometry and its attributes. This 
tree is traversed depth first and the graphics pipeline attributes are 
sequentially modified during traversal. Both geometry (RSP) and raster 
(RDP) attributes are contained in a GBI structure. 
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Figure 4-4 Graphics Binary Interface (GBI) of an Airplane 




GBI Feature Set 

The graphics binary interface (GBI) contains many 3D graphics features. An 
algorithmic description of many of these features is in the OpenGL 
Programmer's Guide. Table 4-1, "GBI Feature Set/' on page 62 lists the basic 
^features of the GBI pipeline. 

Table 4-1 GBI Feature Set 



Prdlissor 



mv 



Functionality 

GBI assembly 
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Table 4-1 GBI Feature Set 



Processor Functionality 



RSP matrix 4ta:6K-^j5 ; er a tions 

3D tra'feformatijr^gjii;:. 



frustu® pipping and BSfelc-face rejection 
lighting and" inflection mapping 
polygon and fme;;rastffization setup 



RDP : ;{p|)|yffon rasterization 

;. ? y text^l^g/nltering 
blendrngdc ;:;;:;:._ 
z-buffering "'"' 
i';::S|k . a nti a liasing 



RSP Geometry Microcode 

Jhere are three different versions of RSP geometry microcode: gspFast3D, 
lIlpLineSD, and gspTurbo3D. The gspFast3D microcode is the optimized, 
: fuTjrfeatured 3D polygonal geometry microcode. The gspLine3D is the 
optimized, full-featured 3D line geometry microcode. The gspTurbo3D is 
the optimized, reduced-featured 3D polygonal geometry microcode. All of 
these rhicrocode types come in two versions. One version of the microcode 
has the fl§§? output the rasterization and attribute commands directly to the 
RDP. The Bther version outputs RDP commands to DRAM. Writing the RDP 
commands to DRAM could be used to overlap graphics and audio. For 
example, you could use the RSP for audio processing while the RDP is 
tjDrocessing commands stored in DRAM. Storing the RDP commands in 
ifpRAM may also be useful for debugging. 
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Audio Interface 



Access to the audio subsystem is provided through the fui^tptfis in the 
Audio Library. The Audio Library sujpcjr.ts both sampled sound playback 
for sound effects and wavetable sy^fesis : %t©m MIDI files for background 
music. For more information on tr|pAudiq^l|iblt^ : :please refer to 

Chapter 1 9, "The Audio Library" . 'l|%, } -4p 
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RCP Task Management 



Both the audio and graphics libraries provide support ^regenerating 
command lists to be executed on|j#gS£P, but they do not handle the 
command hst execution. It is therefore xtecessary for the application to 
manage the scheduling and exjjjution of RCP tasks (command lists and 
microcode) on the RCP. To faciliSte.fiJS' the development package includes 
an example RCP scheduler. Jff' 



The "Simple" Example 

The structure of the scheduler included with the "Simple" application is 
described briefly below. Please refer lb the example code in the "Simple" 
directory for more, details. 

The Scheduler Thread 

The scheduler thread is responsible for collecting display/command lists 
from other threads ;an3iassigning them to RCP tasks for scheduling and 
execution so that real-time constraints are met. This thread has the highest 
;iprii|ity of the application threads, to insure that scheduling occurs 
periodically. 

The scheduler executes task on the RCP based on the retrace interrupt and 
thenmbfltprs the progress, yielding the graphics tasks periodically to 
interleave 'tudio tasks, if necessary 

ipther Application Threads 

f§ie next highest priority application thread is the Audio Manager thread. It 
responsible for creating audio display lists, sending them to the scheduler 
■for execution, and transferring the finished audio to the codecs. It has a 

higher priority than the game thread, to prevent audio clicks caused when 

the audio thread can't meet its real-time constraints. 

Note: The Audio Manager thread is essentially a low-level wrapper around 
the alAudioFrame call (see "The Synthesis Driver" on page 382 for details). 
Higher-level Audio Library calls are made from the game thread. 
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The game thread is responsible for generating gi^h^s;;;:i|ispia5^;l;sts and 
sending them to the scheduler for execution. In admfibh, the game thread 
handles the controller input, makes calls to the Audio Libfi#f|Snd performs 
other tasks traditionally found in the-game's "main loop." 
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GameShop Debugger 



Workshop Debugger Heritage 

The GameShop debugger (gyd|::-deriy#Tts rie%||age from the Silicon 
Graphics Workshop appucatioh^vfejopment fools. It is a source level 
windowing debugger environment tl%t enable#d ebugging of both the CPU 
and RSP software. ,*«,. "li^P"" 



Debugger Components l 

The debugger. is actually composed of several different components shown 
in Figure 4-5, 'Debugger Components," on page 67 

There arg'ifwo deb^^in-^paths. The first path is a C source level windowing 
debugger; : gvd, which has'tnost of the features of common multi-threaded 
debuggers. It talks to dbgif, which interfaces to the rmon debug thread 
tjpxough the Nintefi$0j^device driver in IRIX. 

The second path is the popular printf traces within the application. 
rmorfPrintf () display the messages in the shell that executed dbgif . 

Figure 4-5 Debugger Components 



IliPX host machine 




Nintendo64 development board 
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The rmon debugger thread is actually a high-rMority thlead M : the game 
application and uses many operating system resources. Therefore, the 
debugger and rmonPrintf cannot be used to debug system^le^el code. 

For information on using GameShpp Debugger see Chapter 25, ''GameShop 
Debugger. " ;§|f 
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Chapter 5 

Compile Time Overview 



This chapter describes the flow of tools' required to go from 3D model design 
and music composition to cutting the actual ROM cartridge. In addition to 
the standard -(^ compiler suite, the Nintendo 64 software release supplies a 
number .#• other t|>pj#|*a ; rticuiar to the Nintendo 64 software development 
environrri'int. TMjsourcelibde to some of these tools is provided as an 
example to -help you creaky our own customized tools that give your game 
an advantage in me, v gang||marketplace. This chapter includes the following 
.sections: 

|P ^database modeling 

• ffipdel space to render space database conversion 

• music composition 

• wavet&ble construction 

• building ROM images 
§£ host side functionality 
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Database Modeling 



To do real-time 3D graphics, you need modeling tools to M*te geometry. 
Because many off-the-shelf mode%g;$$f|B:;-are available, there is no 
modeling package in the NintendgM development kit from Silicon 
Graphics. Nintendo has contractellwo top modeling package companies to 
provide the database modeling sofc|t>g:;|pultiGe4;::|nd Alias). 

For texture-map images and traditional fi sprite-type games, you may 
desire image conversion, ef |$njj, and paint software. These are not provided 
as part of the Nintendo 64/;|iv%i£ment kit. 

All of the example applications and source code, including sample image 
conversion programs, use the popular §t£l RGB image format. Additional 

related, but unsuj^rted software, may be obtained from SGI via the 4Dgifts 
product, anor|f mousftp via sgi . com, or from the user community on the 
internet (seefprnp . pppil|f or the comp . sys .sgi hierarchy). One of the 
more popular |?ubUcly availBle packages containing image conversion and 
manipulation software is PBMPLUS, widely available on the internet. 



NinGen 

NinGen-is : a 3D modeling package from MultiGen. It is a derivative of their 
traditionSJlP modeling software, together with an Nintendo 64 database 
format con!ff£pr. The traditional key strength of MultiGen is their ability to 
provide 3D modeling tools for the real-time commercial and military 

flight /vehicle simulation market. 

f llthis market, many database techniques developed for a real-time flight 
sirriiliator are available in NinGen. Some basic features include: 

• Geometric level of detail. 

*•" Binary separating planes for depth-ordered rendering. This is required 
if you don't use the z-buffer. 

• Many polygon count reduction tools. The goal is the best model with 
the lowest polygon count. 
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Alias 



Historically, Alias has provided 3D animation and moving tools for the 
computer-generated film and animatlorimarket segment. Beautiful models, 
sophisticated motion paths, andlfast de|plopment time are all vital to 
success in this marketplace. He§§ is a safp^le'dispme of the strong features 
of the Alias software package: '^S^^if'' 11- 

• NURBs based modeler provides snjLOpth surfaces on models. 

• Motions paths and inverse kinematics "'give complex motion. 

• Special effects such Js : fiarliele systems, many different kinds of lights, 
and texturing capabilities i!li||f!|ye picture quality. 



Other Modeling Tools 

Besides Aiffas andMuIffifn.,. there are other modeling packages on the 
market. Softimage and Nidpinen Graphics are also traditional film and 
animation market tool suppliers. On the PC, the Autodesk 3DStudio is 
em|ering the animatioh iriarket from the very low end of the price spectrum. 

pilrrftand animation tools have many features that can be extracted for 
real-ilne animation. Figuring out how to extract these special features out of 
theses fools can help you give your game application an advantage. For 
exampie^you might be able to use particle system tools to generate texture 
maps. Flipgng this texture book on some morphing geometry to 
approximate the group motion of a system of particles. This may give you 
fire, water, and other interesting objects. 



(fjistom Modeling Tools 

ffor special game application requirements, you may need to create your 
own custom modeling packages. Obviously, it is time-consuming to build 
such a software package in house. The advantage, however, is that you can 
customize the databases to the requirements of your game. For example, you 
might be able to gain rendering display performance if you are able to give 
hints to your modeler about how to order geometry. 
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Model to Render Space Database Conversion 



This section outlines issues you may face when convertic^^pfi a modeling 
database to a rendering database. 



Existing Converters l|y fc: ,,::f 

Both NinGen and Alias software packag#J : feay^|atabase convertors to 
convert to the Nintendo 6$$prmat (Graphics" Blhary Interface). 



CustomConvertors '^f 

Some of you magfwant to write your own database convertors because you 
want to manage a certain resource or attribute in a different way, tailored to 
your game.vjilicon GrffJifeiSprovides a sample convertor / /Zi2c(!P) / from the 
MultiGen fit tile format to the Nintendo 64 format. In addition, Silicon 
Grapics provides a convertejllrom the SGI IRIS image format to the 
Nintendo 64 texture rnemolf format, rgblc (IP). These sample convertors are 
no£;«fornplete, nor are they designed to be totally efficient; they are just meant 
to;r>e#femplate to help you understand what a convertor is and what it 
#leds% do. 



Conversion Considerations 

There are many efficiency considerations to keep in mind when you are 
g^riting a database convertor. Here are a. few: 

• Redundant hierarchical transformations should be eliminated. 
Transformations should be used for articulated parts or instancing, not 

,: : i;for preserving modeling hierarchy. 

"*" Since the geometry transformation subsystem has a vertex cache, block 
loading 16 vertexes to render as many triangles as possible has better 
performance. 

• On-chip texture memory is not large (4 KB). If you axe stamping trees in 
your scene, you should render in texture order. Keep in mind that 
texture order may require a z-buffer, which requires additional dram 
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bandwidth. Yon may need to experuneht^find'the be$t trade-off for 
your game. 

The display pipeline has many,a|tribute states. You : may want to 
determine which sets are global and local to an object. Learn how to 
manage these attributes tcfjpest fit ^ftiaartd of game you are creating. 
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Gamma Correction 



The SNES and Super Famicom do not have gamma correMlioMardware but 
the Nintendo 64 does. Some developr^aye indicated that the colors on the 
Nintendo 64 look "washed out" w0 'ganlna correction turned on. 

If you are currently writing gamel|or:„S|§|s or Supr Famicom (or any 
machine that does not have gamma correction), your production path is 
likely to be setup to compensate for the lack of gamma correction hardware. 
In other words, you are probably picking pre 1 'gamma corrected colors. If you 
use this same production ||%|nd turn Nintendo 64 gamma correction on, 
you will get the wash outiffectt^eause you would have gamma corrected 
twice. "'^Ifevi. 

To undo the n^sfcpmma correction, square and shift down by 8 each color 
component |liii| 8 bit color) or rework your path to exclude the 
gamma corlfction stop> leaving gamma correction to the hardware. 

Every step in your productill path must be involved in the color selection 
process: modeling/ paint software, computer monitors, image conversion 
sofjjliare, the game software, and the Nintendo 64 hardware. 

iflamm^ correction on the Nintendo 64 is recommended; the antialiasing and 
f video hardware work best when it is enabled. 
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Music Composition 



Music composition involves the creation of midi sequences and then 
importing them into the game. Mi#:;seguences can be created using any of a 
variety of sequencer applications. (Performer, Vision, Cubase, MasterTracks, 
to name a few) After the sequences are saved as Midi files, they should be 
converted before being included in the game. l|f|ou are planning to use the 
compact Midi sequence player, the se||uencesj||fbuld be run through 
midicmp. If you are using the regular s^quergpp layer, the sequences are run 
through midicvt. After^e. sequences are con verted, they can be assembled 
into sequence banks with the sbk tool. This is optional, midi sequences can 
be used without being part of a sequence bank. To actually include the 
sequences in the game, a segment; Containing the sequence data should 
added to the spec hie. (See the demdlipp. simple for an example of this.) 

For mformation : i5t|: : how to use sequences m a game see,Chapter 19, "The 
Audio Library/' S$$Wi&, 
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Wavetable Construction 



The audio library can use either compressed or uncompiSgfici wavetables 
for sound reproduction. In either c|Sf |j|jp;,wavetables are first created using 
the digital recording/ editing systern of tn| : sound designer's choice. The 
wavetables are then stored as AIFF files. If the samples are to be 
compressed, the first step is to pr®u ; ce:;;§;compres||on table using 
tabledesign. After the compression table has been built, the wavetable is 
compressed using vadpcm_enc. This wiBgenerap a type of ATFC file that is 
unique to the Nintendo. Ijlfete that AIFC "nfe'created with other software 
tools are not compatible with the compression scheme used by the 
Nintendo.) J§f '" ? i|p;.: ; , 

After the wavetables have been converted to AIFC files, (or left as AIFF files 
if no data compression is desired) they need to be assembled into banks so 
that the Audli EMr^y can reference them correctly. To accomplish this, the 
sound designer musl;ff§teeate a .inst hie, which is a text file that specifies 
the parameters, for s,6und playback and the wavetable files. The .inst file is 
then used byfc to create thdlfank files. The bank files can then be included 
in the game by placing the# in segments in the applications spec file. (The 
creation of .inst files an#tne use of ic is covered in detail in Chapter 20, 
"Aulio Tools/') 
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Building ROM Images 



A final set of tools, headers and libraries are available to" pick your database 
and code into a final ROM images; ffethe Nintendo 64. The Nintendo 64 
development environment heavily leverages Jthe C compiler and 
preprocessor tools to process sJinboHcjala'ihfe hinary objects. A ROM 
packing tool, makerom (IP) packf fees|ibjects in|0 a single monolithic ROM 
imagef according to a specification oSftyhere tb^e objects go. 



C Compiler Suite l|l||,, v ._ 

Currently, the Nintendo 64 development environment has only been 
verified with the IRIX 5.3 MIPS C-compiler suite. The interfaces provided do 
not rely on proprietary features of this compiler; however backend tools 
such as nwMerom : thay rely on specifics of the MIPS symbol table format. 

It is requ:l||! thaf till modules be compiled or assembled with the 
-non_shared and -g JjmpHation flags; neither position independent 
code or a global dataarfis supported. Since the MIPS R4300 supports the 

jSfS H instruction '^1%' the -mips 2 flag is also recommended, as well as 

^optimization flags (-o and -03). 



ROM Image Packer 

The ROM linage packer {makerom) takes as input relocatable objects created 
bv the compiler and performs the final relocations of code symbols. To 
perform these relocations, it invokes a next generation link editor that allows 
||bjects to be linked at arbitrary addresses specified by the developer. After 
llfese relocations, makerom extracts the code and initialized data portions of 
J§e resulting binary and packs them onto a ROM image. The makerom tool 
plan also copy raw data files to the ROM as desired. 

Note: When building a ROM image for the console (as opposed to the 

development system), be sure to 

• link with libultra.a and not libultra_d.a 

• remove all calls to printf and its variations from your application. 
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remove any functions specific to the development bd|fd (such as 
command line parsing or logging) from your "amplication. • 



Headers and Libraries 

Although the Nintendo 64 API incfjjties.j|SMrfaceS;|^r a wide variety of 
areas, the interfaces are made avanaBle i: fe^-.mcludrr|g;' : a single header file, 
/usr/include/ultra64.h, and by linking with<| ; s ; mgle:;|brary, fust /lib flibultra.a 
(01 /usr/lib/libultra_d.a).Th&^ibraiy routineslSllWdken into their finest level 
of granularity, so applicatio^^^ay as they go", only including routines they 
actually use. ;0 '"" 

Note there are two versions of the Nintendo 64 library: a debug version 
(/usr/lih/libultraj^.j^and a non-debug version (/usrAibAibultra.a). The debug 
version of the^pa^provides additional run time checks at the expense of 
some space ||fthe RCll^anct. DRAM, as well as some performance. The 
kinds of cheiJIi performed include argument checking (especially hard to 
find alignmerliproblems), irrM'oper use of interfaces, audio resource 
problems, etc. It is recommended that the debug library be used in initial 
development, and then replaced by the non-debug library later in the 
development cycle. 

fp case If! error, the game loading program gload(lP) will interpret and 
"display tht5.: : errors on the host. 
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Host Side Functionality 



During development, it may be desirable to copy data to and from the Indy 
host to the game. For example, a MIDI sequence could be repeatedly edited 
on the host and them played onplhe Nintendo 64. Of course this could be 
accomplished by recreating an^:downloSclihg< : |fte image repeatedly, but the 
design cycle could be reduced significantly by iib.ply copying the new 
sequence to the Nintendo 64 while tne^ppHca|pirt is still running. 

For these applications,^^^st side, as wefflfa game side API is provided. 
The game side interfaces are, as always denned by including 
/usr/include/ultra64.h arM linlcingjwith /usrflibflibuitra[_d].a. The host side 
interfaces are declared in /usr/inc^^^}trahost.h and defined in 
/usrflib/ultrahosta. 
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Ultra 64 Operating System 
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Chapter 6 

Operating System Overview 



Overview 

The Nmtend# : 6i;s^stem runs under a small, real-time, preemptive kernel. It 
is suppliejfas a set^fcrpri-trme library functions, so that only those portions 
that are actually #e"H 'af included in the game's run-time image. In the 
remainder of this SocumeJf§| it is referred to as the operating system, 
although it is so mmimalfpat it has not been given an official name. 

gifekernel can be considered as being layered into core functionality and 
tugfter-level system services, as illustrated in Figure 6-1. 




Figure 6-1 Nintendo 64 System Kernel 
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Tlireads, messages, events, and raw I/O compose i^y^ernjijpf tKiitNIinterido 64 
operating system. Upon this base are built some aliitional services that 
facilitate access to the raw hardware. £'*i£ : ' T 

In this introductory section, a brie|#vervlew of these services will be 

provided. ISi^lIll^ 



Threads 

All code that runs under t|j|;|raerating system runs in the same address 
space.That is, the game ruffs' as ; oh§; : process. While it is possible to structure 
a game application as one monolitxn%ferpgram, it is usually advantageous to 
subdivide it into smaller, more manageable subprograms called threads. 
With its own st|£k,each thread usually performs one function, often 
repetitively. This s^ldivision leads to simplicity for each thread; thus, it is 
easier to "gjf'it righl| ; aiji chip minimize interference between threads. The 
threads section destines 'tle^e threads, how they are scheduled, and how 
various operations may be rJSlformed on them. 

Th|fgds may be creale#8«stroyed, stopped, or blocked (the latter by 
vyaitihg on a message). Threads normally run until they require some 
. : trisour<je or event to continue, at which point they yield the CPU to another 
lmreadiS|ach thread has an assigned priority level, used to determine which 
thread g||§ the CPU at any given time. In response to an external event, a 
thread majfbe forced to yield control of the CPU. The operating system 
preserves tft&state of the thread properly for restarting at a later time. Thus, 
the system can properly be described as preemptive. Threads may even be 
preempted during system calls when it is safe to do so. 

Hojyever, there is no concept of a swap clock or "round-robin" scheduling 
as ll-found in UNIX and other time-sharing systems. Thus, two or more 
ti^liads that run at the same priority level do not alternate in use of the CPU. 
lllie thread that "has" the CPU runs until it yields or is preempted by a 
higher priority thread in response to an exception. 



Messages 

Since the operating system is message-based, messages are among the most 
important of the resources available to the user. Unlike many popular 
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real-time kernels, no semaphores or event flag|^p: : providejd:. All 
synchronization is provided via sending and receiving me||ages. This has 
deliberately been made very efficient, and the lack of other synchronization 
primitives should not be a p rob lejpfilri: fact, there are advantages to using 
only this mechanism. The opeippfig s^'s'tem^cpde itself is smaller and less 
intrusive on game space than illvouldTp-if itliad to provide multiple 
facilities for thread synchroruzationi #po, srnci|t is often the case that 
information must be transferred whelteireadsj^Tichronize, we get more 
usage out of a single operation. ^M^W"' 

Of course, messages are$I|$|^seful in simply transferring information from 
one thread to another. In this operating system, they are also used to transfer 
information when a system event-occurs. 



Events Jf : """ '" : -"§f ^ 

The operating system mafikges interrupts and exceptions on behalf of the 
game system in a relatively unobtrusive way. Some interrupts must be 
handled by the system. Joxle itself. Others require further decoding to 
4;||ermine which event has actually occurred when the CPU is interrupted. 

Thef&xception handler built into the operating system performs the 
decoding of interrupts and other exceptions and maps them to system 
events. If the system event is one that may be handled by the game itself, 
then a rr1§|§.age is sent to an associated event mailbox and the game 
applicatioh-is notified. In this way, the game designer can provide an 
interrupt handler to deal with the exception as required by the game 
requirements. 



jpfemory Management 

In this operating system, the responsibility of memory management is left 
up to the game. That is, the operating system provides no heap or dynamic 
memory allocation mechanism for the game. Since the game can access the 
entire memory map, it has total control on how memory is partitioned and 
used. The operating system simply runs in the kernel mode (ksegO) with 
cache and direct mapping enabled. In this mode, the virtual address 
0x80000000 is mapped directly to physical address 0x0. Translation 
Lookaside Buffer (TLB) is not used by the operating system to provide 
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virtual memory support. However, low-level routines arjjpvafljj|le for game 
developers to program the TLBs directly. Furthermore, a region library is 
provided to simplify the task of allocating and de-allocat|ng :i |i|ced-size 
memory buffers. __ :i: M *fe**- 

Game developers should also be jfjfare or]||^=irjjgortance of invalidating 
and flushing caches before transf ifcng data between either cartridge ROM 
or RCP and main memory The opetatif*§; -System ||pvides useful functions 
to invalidate both instruction and data cffh.es and|ib write back data cache. 



InputandOutput 

The Nintendo 64 system spends a goolfqleal of its time performing I/O 
operations. The operating system provides an optimized I/O interface layer 
that directly gfmSiunicates with the hardware. Some of these interfaces 

include; J|f' ^iwsJXi,. 

• vi — the vtdeo ihierface||The interface routines communicate with a 
video manager system tlfead, called the VI/Timer manager. This 
thread receives all yerMqal retrace interrupts and programs the video 

^Hardware. In additiblf it also receives all counter interrupt messages 
M f&kd implements timer services. 

Pl^-the peripheral interface. The PI also has an associated I/O manager 
threld, the PI manager. It manages access to the ROM cartridge so that 
two threads do not attempt to DMA from ROM to RAM at the same 
time. ,: li||| 

• AI — the audio interface. This interface programs the audio hardware to 
, S) ,.. output the desired sample rate and manages access to the audio data 

81! buffer - 

• IppP — This is the RDP interface. It is mostly of interest because it has an 

associated system event when a DP operation is complete. 

■ • Cont — the controller interface. This interface resets, detects, obtains 
status, queries and reads data from the game controllers. 
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Timers 

The operating system provides convenient functions to start and stop both 
countdown and interval timers. TJfe^toers are expressed in CPU count 
register cycles, which depend on the video clgck. That is, a counter tick in a 
PAL system occurs more freqi^jjtly than.the one in a NTSC system. 
Developers can also set and get real time counti|value. 

Controller Pack H&§ystem 

The Nintendo 64 controller suf |orts an add-on RAM pack that can store ^ 
either 32 KB or 64 KB of data. The of erating system implements a simple hie 
system on this pack where developer! can find, create, delete, read and write 

files. <mho,.. 



Debugging Sipporfff 

I|i addition to the fuppf itfor the high-level GameShop debugger gvd(lP), 
$f§i, operating system also provides additional useful facilities for 
JeleSugging. Developers can use convenient routines to log messages to 
pre%llocated buffer for delay transfer to the host Indy. Since this logging 
utiHtfhas low performance impact, it may be well suited for debugging 
real-trr%problems or running performance analysis. Developers can also 
use the pi&tf-like utility osSyncPrintf(3P) to display text formatted messages 
on the holflndy 

fgoot Procedure 

Jjlhen using the Nintendo 64 development system, the developer needs to 
Prun the game loader gload(lP) program to download his prepared ROM 
image into the cartridge memory on the development board. After the 
memory image is loaded, gload can optionally read back the memory and 
verifies the contents. Then, it generates a reset signal to the development 
board, causing the R4300 to jump to the reset vector where it starts executing 
the boot code from the PIF rom. 

Some of the important tasks performed by the boot code include: 
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1. Initialize the R4300 CPO registers 

2. Initialize the RCP (such as halt RSP, reset PI, blank video, stop audio) 

3 . Initialize RDRAM and CPU cadjg| f | h 

4. Load 1 MB of game from ROM to RDRAM at physical address 
0x00000400 fjl 

5. Clear RCP status ^* S \ 

6. Jump to game code fe 

7. Execute game preambjitoqle (which is similar to crtO.o and is linked to 
game during makerom process; 

• clear BSS for boot segment fill§efined in the spec file) 

• set up b.OQt segment stack pointer, 

• jurnp^b boot .entry routine 

8. Boot eritly rout^e'sKMIii call oslnihalize(3P) 
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Chapter 7 



Operating System Functionality 



Overview 

Threads, Messaps^and events work together to form the core of the 
Nintendo^ opera|ftpS|stem. Nintendo 64 applications run under a small, 
multithreaded Deflating !gj stem. Simply put, this means that the R4300 CPU 
switches between several;|§iidependent components called threads. Each 
thread consists of a sequence of instructions, a stack, and (possibly) static 
Jita that is used only by the thread. Subdividing an application into threads 
Ihai-several advantages. You can effectively isolate each part of the 
5 aprJlfcation to avoid interference. You can divide your application into small, 
easily^debugged modules. Since each thread can be written independently 
to perform exactly one function, complexity is reduced. 

Messages are a mechanism by which threads communicate with one 
another. While this could be done using shared global variables, such an 
approach is often unsafe. One thread must know when it is safe to read data 
jiiat is being written by another. Message passing makes communication 
'§|fween threads an atomic operation; a message is either available or not 
4f ailable, and the associated data arrives at the receiving thread at one time. 

" A second, perhaps more important function of messages is to provide 
synchronization between threads. Often a thread reaches a point in its 
execution where it cannot continue until another thread has completed some 
task. In this case, the running thread has no useful work to do, so it should 
yield the processor until the task is completed. You use messages to provide 
the mechanism for the thread to wait until that time. 
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Often a thread needs to wait for an exception si^fe : ;:as ; ,a^ : ::interrUpt. 
Exceptions are trapped by the operating system ahliurned intg events. 
Threads may register to receive notification of system ev<$$0f requesting 
that the operating system send them a message whenever a system event 



occurs. 



System Threads, Application Threads, and the Idle Thread 

There are several types offjggeads in a typicM ;: application. There is a 
distinction (using priority) between system threads, application threads, 
and the idle thread. If "^§&=, 

The PI manager, described in the IO secllbn, is typical of system threads. It 
acts as a resour^aianager, allowing multiple user threads to share a critical 
resource safelp— mitis case, the cartridge ROM. 

The idle thread, whfltfhasi|f|lowest priority (a priority of 0) of any thread 
in the system/runs only whelf all other threads are blocked awaiting some 
event. Note that the idle thgad is required; the system will not run without 
it T|ie game applicatioMlself is composed of user threads. User threads are 

4!lhid as those threads having priorities between 1 and 127. 



Thread ( Data Structure 

Each thread ^associated with a data structure of type OSThread declared by 
the user. The address of this structure is the only identifier used in thread 

system calls. Since the thread data structure is essentially part of the 

%§j|Jication itself, you should take care not to overwrite it inadvertently The 
stricture contains the thread's context (mostly this consists of its register 
contents) when the thread is not running; Each thread has a priority used in 

, scheduling, and an identifier used only by the debugger. These are also 

''maintained in the thread data structure. 



Thread State 

A thread is always in one of four states. The state of the thread is maintained 

in its thread data structure for use by the operating system. A good 
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understanding of thread state is helpful in des|grung::03ur application, since 
it leads to a better understanding of how the o^efatihg system will behave. 

• Running. Only one thread in the system, is in njnhinf state at a time. 
This is the thread that is currently executing on the CPU. 

• Runnable. A thread m runifable sta&i|;|eady to run, but it is not 
running because some omf^ihre^nas higher priority It will gam 
control of the CPU once it becomes the hig|pst-pnonty runnable 
thread. '£&;>,. . 

• Stopped. A stoppe||i|J|read will not be scheduled for execution. Newly 
created threads are;;|ri :: -tfls state. Threads are frequently stopped by the 
debugger, and an &ppkcaik§|£may stop a thread at any time. Stopped 
threads become runnable vi£ai±v$StartThread system call. 

• Waiting. Raiting threads are not runnable because they are waiting for 
some etSnllO; occur. A thread that is blocked on a message queue is in 
wak^|g state*||||$^&l of a message returns a waiting thread to runnable 
o r ruifnin g st§t&' ""'"' l] 



Scheduling and Preemption 

iOrlife the OS is running, the highest-priority runnable thread in the system 
alwl|js has control of the CPU. When a thread gains control of the CPU, it 
continues to run until it requires some resource or event to continue. It then 
relinquishes control of the CPU and the next highest priority thread gets to 
run. TypiSStly, this happens as a result of the running thread calling the 
function to receive a message. If no message is present in the message queue, 
the running thread will block until a message arrives. Note that the thread is 
|rio longer runnable when it is blocked on a message queue, so it no longer 
■fits the criterion of being the highest-priority runnable thread. 

jliaore frequently, the running thread loses control of the CPU through 
^preemption. In response to an exception (for example, an interrupt), a higher 
priority thread becomes runnable. Since that thread should now be the 
running thread, the state of the interrupted thread will be saved in its thread 
data structure, the state of the newly-runnable thread will be loaded to the 
CPU, and the new thread will resume execution at the point where it last ran. 
The preempted thread is still runnable; it just doesn't have the highest 
priority. When it once again becomes the highest priority thread, it will run 
again from the point where the interrupt occurred. 



NU6-06-0030-001G of October 21, 1996 91 



NINTENDO 64 PROGRAMMING MANUAL DRAFT 



Note that the running thread does not need to be^|t ; a . sejHience^oint (for 
example, a system call) to lose control of the CPU.^illinis fitsjgie classical 
description of a preemptive system. £i*0W 

Multiple threads within an applicabe^rrecgently need to synchronize their 
execution. For example, thread A <#ihnot ccgi^§e|until thread B has 
performed some operation. The m^^p^ssing'rl^ctions provide the 
needed synchronization mecharusm>arii ; :;|re descrffjld in the chapter on 
messages . '%:&:«.. ....,-sM 



ThreadFunctions 

There are eight functions associated with'? threads. Please refer to the 
reference (man), niggs for specifics about the arguments, return values, and 
behavior of th|se rultc;tions. 

• osCre atejjjhread £W^W^ 

This function is called once per thread to notify the system that a thread 
is to be created. Creating!? thread initializes its thread data structure 
,#jth the starting program counter, initial stack pointer, and other 
information. Once the thread data structure has been initialized, the 

; :# r thr£ad can be run. 

1 • osDestroyThread 

This furi^ion removes a thread from the system. Once called, the 
thread cllpot be run any more. 

• osYieldThread 

: This function notifies the operating system that the running thread 
wishes to yield the CPU to any other thread with higher or equal 
Sf-iority. If all other runnable threads have lower priority, the running 
...Jthread will continue. (In practice, it is not possible for a runnable thread 
!P ;: ' to have higher priority than the running thread.) 

• osStartThread 

This function call makes a thread runnable. If the specified thread is of 
higher priority than the running thread, the running thread will yield 
the CPU. If not, the running thread will continue and the started thread 
will wait until it becomes the highest priority thread in the system. 
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osStop Thread 

This function call changes the state of a thread to stored, after which 
the thread will not be able to run until restarted. ##¥ thread was 
waiting on a message queue,: it:': Will be removed from that queue. 

osGetlhreadld 

This function returns the fft^f^tifread assigned when the thread was 
created. It is used only by the debugger, j-p"' 

osSetThreadPri g^.^ ~ ;): myj>'- r ' 

This function chang^s^e. priority of a thread. If the running thread is 
no longer the higHest-priOr|tv :i runnable thread in the system as a result 
of this change, it will yield tne/CsPU to the new highest-priority thread. 

osGetThjgadPri 

This ,&hcti6n;re turns the running thread's priority level. 



Exceptions and Interrupts 

:;fjhe R4300 CPU used in the Nintendo64 processes a number of exception 
"tyfjes. Most share a common vector, where the operating system receives 
then}, reads the CAUSE register, and determines which of the 16 legal causes 
occurred. With the exception of the Interrupt cause (which may be either 
intemal:or external), all exceptions are internally generated within the CPU. 
For exaiifille, an attempt to fetch a word from an odd address will generate 
an address" error exception. 

The operating system has exception handlers for Coprocessor Unusable, 
Ipreakpoint, and Interrupt exceptions. All other exceptions are considered to 
-;|^ faults and are passed to the fault handler. The fault handler stops the 
Jjpulted thread, sends a message to any thread (i.e., rmon) registered for the 
|5S_EVENT_EAULT event, and dispatches the next runnable thread from the 
system run queue. If the debugger is present, a message is sent from the 
target to the host and the debugger can show you exactly where the fault 
occurred. Breakpoint exceptions are also handled in this way. The debugger 
will stop all user threads in the event of a breakpoint or a fault. 
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When an interrupt occurs, the CAUSE register Ml-f^airurpa 1 to fee which 
interrupt caused the exception. The R4300 supporSeight interrupts 
described below. MdiW 



Table 7-1 



Name 


Cause 


Description 


Software 1 


CAUSE_SW1 


Software genera|ed interrupt 1 


Software 2 


CAUSE„SW2 4 


Software generated'fiMrMipt 2 


RCP 


CAUSEJP3 


i^i::li|terrupt asserted 


Cartridge 


CAUSEJP4 


[A penpfiei|||has generated an interrupt 


Pre-nmi 


CAUSE JP5 


User has pushed reset button on console 


RDB Read 


CAUSE JP6 


Lady has read the value in the RDB port 


RDB Write ] 


|AUSEj§7- ' 


tetjy has written a value to the RDB port. 


Counter 


C!USE_IP8 


internal counter has reached its terminal count 



If the RCP interrupts the R4300, then an RCP register is read to see which of 

the RCP interrupts is being asserted. Thus, processing RCP interrupts is a 
pvo stlfc process - first the cause of the CPU interrupt is determined, then 
'the causiloi the RCP interrupt is isolated. 

Normally, tn£^Jintendo 64 game threads run with all interrupts enabled. It 
is possible to Cnange the interrupt masks of the R4300 and RCP via a system 
call. Clearly, this must be used with great caution, as disabling a critical 
liMerrupt can cause the system to lock up or prevent real time response. 



E^ints 

Once the cause of the interrupt (or other exception) has been determined, it 
is mapped to one of 14 events defined for the Nintendo 64 system. Table 7-1 
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shows the events, why they occur, and whonorxnall^registirs to receive a 
message when each event occurs. 

Table 7-2Events Defined for the Nintej*di£,64 System 



Event Name 



SW1 

SW2 

CART 

COUNTER 

SP 

SI 

:ai ., 



Even,t:pescrSption 



Owner 



Syste1S : i:§pft^|re intermit 1 
asserted'" m °'"{% 

System softwargiip^rrupt 2 
lasserted 

Pe¥iphe.nai. has generated an OS 

intern^||, S: 

Internal counter reached terminal VI/ Timer 
count manager 

... RCP SP interrupt; Task Done/Task Game 
'gjfield 

|f<CP SI interrupt; controller input Game 
^•'available 

RCP AI interrupt; audio buffer Game 

swap 



VI 


RCP VI interrupt; vertical retrace 


VI /Timer 
manager 


PI "li 


RCP PI interrupt; ROM to RAM 

DMA done . 


PI manager 


DP 


RCP DP interrupt; RDP processing 

done 


Game 


ItfENMI 


An NMI has been requested and 
will occur in 0.5 seconds 


Game 


CPU_BREAK 


R4300 has hit a breakpoint 


Rmon 


SP_BREAK 


RCP SP interrupt; RCP has hit a 

breakpoint 


Rmon 


FAULT 


R4300 has faulted 


Rmon 


THREAD.STATUS 


Thread created or destroyed 


Rmon 
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Event and Interrupt Functions 

• osSetEventMesg 4WmW~ 

This function call specifies a mes^age^gueue and message to be sent in 
response to a system event. P:f ] '" r fsi 

• osGetlntMask ; :||:. ., ; #^' "'*lf! 

This function returns the current interrupt mail! (mcluding both the 
R4300 and RCP masks). Wk^<00' 

• osSetrntMask 

This function specified new.;;;pt|errupt mask (including both the R4300 
and RCP masks). 

Non-Maskabie Interrupts and PRENMI 

When the console RESET swish is pushed, the hardware generates a HW2 
interrupt to the R4300 CPU.. The interrupt is serviced by the OS event 
handler which sends a. message of type OS_EVENT_PRENMI to the 
message queue associated with that event. 

Jjfe HIP interrupt will be followed in 0.5 seconds by a non-maskable 
ffetermp§(NMI) to the R4300 CPU (unless the RESET switch is pushed and 

held for ritQre than 0.5 seconds, in which case the NMI will occur when the 

switch is relggsed). 

After the NMI occurs, the hardware is reinitialized, and: 

ff|%; v • The first Meg of the game in ROM is copied into the first megabyte 
,: of RAM after the boot address 

j|| The BSS for the boot segment is cleared 

|IP"« The boot procedure is called. 

Note: There are some minor differences between power on reset and 
NMI reset. After power on reset, the caches are invalidated. After NMI 
reset, the caches are flushed and then invalidated. Also, the power on 
reset configures the RAM, while NMI reset leaves the RAMs alone. 
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After NMI reset, the contents of memory, except forjtKe 1 Meg that is copied 
in, are the same as before the NMI occured. Thepbbal variable, 
osResetType, is set to on a power up reset and to Lj&M^MMI. 

If your game does not use the sc^ed{iie| : (see Chapter 24, "Scheduling Audio 
and Graphics"), it should setifp to respond to the OS_EVENT_PRENMI 
event by associating a message queue with the event early in the game code. 
This is accomplished as followsr r >' : ''^| 

o s S e t E ve n t M e s g ( S_E VENT_P REltif £ : ,: , i ; : ;Sf : o me_me s s a g e_qu e u e > ) 

If your game does use : fc|ie;|cheduler, it needs only to test for a message of 
type PRE _NMI_MSG^oh its eli£ni message queue. The scheduler performs 
the event initialization, and forwilSg the OS_EVENT_PRENMI message to 
the client message queue as soon as'lt is received. 

Exactly hOXv a g : a|ne should behave when it receives OS_EVENT_PRENMI 
mclude^Ninten^ piSiM.es on game consistency (such as fading the screen 
to black o|; vramping mealdio volume down), but from a technical 
standpoint, when the gafffe receives the OS_EVENT_PRENMI message it 
should do the following^ 

il%l. * Stop issuing graphics tasks to prevent the RDP from being stopped 
in a non-restartable state. 

life Stop issuing audio tasks to prevent audio "pops" 

• 'l||top issuing ROM (PI) DMAs 

To test thilj you can generate an NMI on development board by running the 
following program on the Indy. This is equivalent to pushing the RESET 
switch on the Nintendo 64 machine. 

II /* 

; ||r * Program to simulate pressing and releasing the RESET 

jf * switch on the Ultra 64. 

* 

* Copy this code to resetu64.c and type "make resetu64" 

*/ 

#include <unistd.h> 

#include <fcntl.h> 

tinclude <stdio.h> 

finclude <sys/mman.h> 

#include <sys/u64gio.h> 
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#include <PR/R4300.h> 

#define GIOBUS_BASE Oxl f 400000 
#define GIOBUS_SIZE 0x200000 



main ( } 
{ 



i n t mmemF d ; 

unsigned char *mapbase; ""'' ' 

struct u64_board *pBoard; 

if ( (mmemF d = og|l|4"/dev/mmem" , 2)) < 0) { 

perror ( "opeii" oi^$gy / mmem f ai led" ) ; 

return(l) ; '*'' ""^Sfe-, 

} ^%|> 

if I ( ; fn||||se = (unsigned char *)mmap(0, GIOBUS_SIZE, 
4f ""'"""'il PPOT_READ | PROTJWRITE , (MAP_PRIVATE) , 

|lf illfnmemF d , PHY S_T 0_K 1 ( GI OBU S_BAS E ) ) ) = = 
^&b 0''" f-u&signed char *)-l) { 
perror ( ° mmap % );0 
return ( 1 ) ; jxf 



kpBoard - (struct u64_board *)(mapbase); 

||Board->reset_control = _U64_RESET_CONTROL_NMI; 

Sginap(lO) ; 
pBt>ard->reset_control = 0; 



} 



Internal OS Functions 

Sorfgof the internal OS functions are briefly described below. Broken into 
three groups, these functions are mentioned here with the purpose to reduce 
; potential duplicate effort from developers. Most of these functions are 
"ample routines to access various R4300 registers, Translation-Lookaside 
Buffer (TLB) information, and internal active thread queue. Please refer to 
the reference (man) pages for specifics about the arguments, return values, 
and behavior of these functions. 

The first group provide functions to access various common R4300 registers; 
• osGetCause, osSetCause 
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These functions returns and specifies tnel|n^e#of thf R4300 Cause 
register, respectively. M&^& 

• _osGetCompare, „osSetCprnpare 

These functions returns ajjf specil|s. : th| ; content of the R4300 Compare 
re gister, resp ectively. "' '^I|§fc 

• _osGetConng, „osSetConfig 

These functions returns and specif e^f Content of the R4300 
Configuration re^l^i,, respectively. 

• _osGetSR,. _osS||S^' : ' :: - ! |:|;i;;;| iit 

These functions returns and s|teifies the content of the R4300 Status 
register, respectively. 

• _osGetFpcCsr, __osSetFpcCsr 

Thi8 functifiprelirns and specifies the content of the R4300 
floairig-polit Control/Status register, respectively. 

The second group jproyide functions to access TLB information: 

• _osGetTLBASID 

%his function returns the TLB Application Space ID in the R4300 

'gmtryHi register, 

• _GsGetTLBPageMask 

For Specified TLB entry, this function returns the content of the R4300 

PageMask register. 

4. • __osGetTLBHi 

J For a specified TLB entry, this function returns the content of the R4300 

Entry Hi register. 

" • _osGetTLBLoO 

For a specified TLB entry, this function returns the content of the R4300 
EntryLoO register. 

• __osGetTLBLol 

For a specified TLB entry, this function returns the content of the R4300 

EntryLol register. 
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The third group provide functions to access internal. a ctm$ thread queue to 
find faulted thread (s): ' Wi *f 

• osGetCurrFaultedThread «r«'V; 

This function returns the most rec%nl : %|dted thread. 

• osGetNextFaultedThread ||| j3~'" :K::a W^ 

This function returns the next faulted thread from the internal active 
thread queue. fl : .-.; v .. 
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Chapter S v m jm : ' 

Input /Output Functionality 



Overview ^ r 

The mput|5uTp|f (I/O) subsystem exists on most operating systems tor 
three ma|n reasot^i^Pg^ 

• to mdt^device-specilffjdetails m device drivers through which the 
operating system transfers data and control 

Sh to provide a fair and safe access scheme to the devices, since most of 
| r ^>. them are shared resources 

• to provide a consistent, uniform, and flexible interface to all devices, 
Slewing programs to reference devices by name and perform 
high-level operations without knowing the device configuration. 

Usually, tfte I/O software is structured in layers: 
9. device-independent system interface 
|L0. device drivers 

pp.. interrupt handlers 

The interrupt handler is mainly responsible for waking up a device driver 
after an I/O operation completes. The device driver performs 
device-specific operations, such as setting up registers for DMA and 
checking device status. The device-independent system interface provides a 
uniform interface to user-level software and common I/O functions (that is, 
protection, blocking, buffering) that can be performed across different 
devices. 
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For the RCP, there are two modes of I/O operat!fe|^||;>p' :: 

• DMA provides a minimum of 64-bit transfer between; ;%g;#DRAM and 
any of the devices - „_ 

• IO provides a 32-bit transfer b| fween 'f|e. CPU and any of the devices 

The RCP consists of the foUowmgm^oE; devices arwf: interfaces (see 
Figure 8-1): 

• Reality Signal Process^ (RSP). This irifer^ial :: processor supports both 
DMA and IO operatiol|ii||fe|ween RDRAM and I/Dmem addresses. 

• Reality Display Processor (RDP). This internal processor supports only 
DMA from either RDRAM or 5rll|||§addresses to its internal buffer. 

• Video Interface (VI). This write-only interface connects to the video 
DAC. It suppo^ll-pnly DMA from RDRAM to a specific video butter 
address and alloWs,you|o change video modes and configurations. 

• Audio Inleriace|AI). ThigFwrite-only interface connects to the audio 
DAC. It supports only D|p\ from RDRAM to a specific audio buffer 
address and aUowsypu/lo set the audio frequency 

• JPtHpheral Interface (PI). This read-write interface connects to the ROM 
,iM' ; cartridge and other mass storage devices. It supports DMA as well as 

Jf IO lead/Write to ROM addresses. 

• Serial ■Interface (SI). This read-write module interfaces to the PIF, which 
connectStp the game controller and modem devices. It supports DMA 
as well aitO Read /Write to PIF RAM addresses. 
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Figure 8-1 Logical View of RCP Internal Major Devices arj.d, .^efface Modules 















__ 


•$&§. 










A 


k 


| % RDRAM 


























CPU 








SP 


.■■: L 


























&». Video 

::-■- /^Interface 




Video 
DAC 








DP 


■»; 




















Audio -;.:; 
Interface 
(AI) 




Audio 

DAC 












RCP 




^ 








MkmM- 












Periphera 
Interface 
(PI) 




ROM 
Cartridge 






















i 




Serial 

Interface 

(SI) 




PIF 




Game 

Controller 


f 

























Design Approach 

Since Nintendo 64 operates in a real-time environment, its I/O subsystem is 
gme of the most time-critical areas. Furthermore, the customized Nintendo 
•Hi environment contains a well-known set of device interfaces that remains 
unchanged for some time to come. Therefore, its I/O subsystem is mainly 
ip'signed for optimal throughput and response, and not for portability and 
generality. This design approach coincides with the main Nintendo 64 
design philosophy, which has always been (and still is) to follow the 
iriinirnal approach. 

The Nintendo 64 I/O subsystem contains these components: 

• a device-dependent system interface 

• a device manager for shared devices 
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• a system exception handler ^^s-^tM' 'Wa 

These components represent a much trimmed-down version of the typical 
I/O layers. All overhead associated with; device-independent interfaces 
(that is, naming and buffering) hag; peeh r^rnoved; protection is 
implemented only on shared devlffs. Lowlk^lfraw) I/O mterface is also 
available, allow mg you to custorriig% device' interlaces based upon your 
specific needs. The result is a very Hghtwjsight andpptrmized mterface that 
allows you to access (in most cases) the devices... Jpectlv 

Each of these components f|J||^§,cribed further in the sections below. 
However, first it is important to :; elis;e^iss some properties (such as synchrony 
and mutual exclusion) that the Nir!li§p§o 64 I/O subsystem should exhibit. 



Synchronous : i?&ys. Asynchronous I/O 

Synchronous;!;/ O arip asynchronous are two fundamental methods of 
servicing I/O requests. In synchronous systems, the calling process is 
blocked after issuing an I/Qpequest, thus allowing I/O to overlap with the 
exegjitton of other processes. In asynchronous systems, the process is 
allowed to continue execution after initiating an I/O operation. Most 
|gsten^;implement the synchronous I/O method since it is easier to use and 
generally preferred by high-level language programmers. 

However, i|ihe Nintendo 64 environment, asynchronous I/O is the 
preferred ehdlSe, mainly because of the asynchronous nature of the real-time 
game environment. For example, a game might want to start paging in the 

mext scene data in the background while working on the graphics task list. 

sflfPpBrefore, asynchronous I/O has the potential to enhance the throughput on 
a thread basis. Furthermore, synchronous I/O can be easily implemented on 
topjjp the asynchronous facility by having the calling process blocks on a 
m.|page queue immediately after initiating the I/O operation. 

Therefore, all interrupt-based DMA operations are asynchronous operations 
and all asynchronous notification is handled via the message queue facility. 
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Mutual Exclusion ; %; ;:■ 

On most systems, some devices such as disks and prmilrs are shared 
resources. The I/O subsystem rnus-t- ensure that only one process can use a 

device at any one tune, thus e^gludmgi;:6 i ther ; requesting processes and 
forcing them to wait. .•^ v> '"''~'' i: ^. 

In the Nintendo 64 environment, eacf|device,^n process only one I/O 
transaction at any given rune. For exairrj&le^^triere is a DMA transfer m 
progress between RO$f||||d RDRAM, you cannot issue an I/O read from a 
different ROM locationf:#s^ch a read is issued, the current DMA transaction 
will probably fail. Tnerefore/grptection (or mutual exclusion) should be 
provided for devices that support-f Bth DMA operation and I/O read/ write. 

In this systemli^utual exclusion is not implemented as a general scheme for 
all devices/but ralher as a specific scheme for each identified shared device. 



I/O Components 

Jffe Nintendo 64 1/6 software subsystem consists of the following major 
lCcl||porients: system exception handler, device manager for shared devices, 
and|levice-dependent system interface. Figure 8-2 shows the interaction 
between some of these components to service an I/O request. This 
interaction assumes that the device is not shared, and therefore, requires no 
mutual Illusion. 
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Figure 8-2 Interactions Between I/O Components S€ 



Ipyi/O Request 




.l).:Ap|ji registers an event, a message queue, 
|pd a message with the system 

j-> : ;^.|||f App requests I/O operation 

: ' ; " : '"'" :: fi|DMA) viafflfe system interface 



4) Exception Handler ! i| 

notifies App by send- *j 

mg the registered mes- | 
sage to message queue 




3) Device interrupts CPU upon I/O 
completion 



jfpstifjn Exception Handler 

'The Ninlilyio 64 system contains a system-wide exception handler that 
traps all exceptions and interrupts. This handler is simply an optimized 
event notifier.fpiat is, upon receiving an event (either a supported exception 
or interrupt), the handler searches the event table for an associated message 

jqueue and message, sends the message to the queue, and simply returns. 

f§||e handler does not perform any device-specific operations. The 
os:i:£tEventMesg system call is provided to register a message queue and 
a rr^sage with a specified event. 



Device Manager 

Depending on the user application, a device in the Nintendo 64 environment 
may be shared between two or more threads. Furthermore, if you want to 
utilize both DMA and IO operations on a device, you must ensure that these 
two operations cannot overlap. For each device that requires protection, you 
can use the concept of a device manager to implement mutual exclusion. 
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The Device Manager (DM) is simply a mread%^ni# ;: at a hffh priority. The 
main purpose of this manager is to process all DMA re^uejji to and from a 
device (that is, ROM devices), thus guaranteeing safe and orderly usage of 
the device. Upon start-up, the man^gf ^registers an event, its event message 
queue, and a message with mej|ptem: ;i The manager is then blocked 
listening on its input command queue for request messages. The manager 
simply reads from the front of the ;queue and professes one request of a time. 

After calling the corresponding low-let^ldeype routine to initiate the I/O 
operation, the manage1j|||eri blocks on Bf filling on the input event queue, 
waiting for the event seft|||Gm the exception handler, signaling I/O 
completion. Once awakened/ tKemanager then notifies the calling thread 
(I/O requestor) by simply sending the request message to a pre-registered 
message queue. The manager, then^returns to listen on the input command 
queue for new ; :jfquests. 

The reas||i for alte|^|tfeg the listening between these two queues 
(commahlljmd eferit quifes) is that there can be only one outstanding I/O 
transaction at any given |HfLe. Figure 8-3 summarizes the interaction 
between various I/O components to service an I/O request on a shared 
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Figure 8-3 Interaction Between I/O Components and a Shared; Device 
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glgevice-Dependent System Interface 

The device-dependent system interface is actually composed of two layers 
of :||hction calls: a high-level abstraction layer and a low-level, raw I/O 
; layer. In addition to providing mutual exclusion on devices that support 
both DMA and IO operations, the high-level layer also uses the lower layer 
to initiate raw I/O operation. The reason for exposing the raw I/O layer is 
to allow you to construct your own custom I/O software interface. 
Furthermore, if the user application requires no protection for accessing 
devices, using the low-level layer directly is the optimal way to request I/O 
operation. 
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In the following sections, the functions are p^tifioned and; described under 
each device /interface separately. For high-lev^Fb : ^eration, f each function 
name starts with os<DeviceName> for easy identificfippTor low-level 
operation, the function name start^Mith os<DeviceName>Raw. Please refer 
to the appropriate reference (m : cth) pages for specifics about the arguments, 
return values, and behavior ojjfhese fu^ptiori|;.,. 

Signal Processor (SP) Functions' 

• osSpTaskStart .|| w ' :;?i Wv0 : ' r 
This function loads a task and starts it running. 

• osSplaskYield 

This function asks a task rurmihg on the SP to yield. 

• osSpT^siYrgicled 

TTii|::€mctioii:op^ei^:to see if a recently completed task has yielded. 

Display Processor (DP) functions 

...ffe osDpGetStatuS'/^' : '' 

'-fphis function returns the value of the DP status register. The include file 
/rep.h contains bit patterns that can be used to interpret the device status. 

• osDpSetStatus 

Thisltmction allows you to set various features in the DP command 
registefc Refer to the include file rcp.h for bit patterns and their usage. 

• osDpSetNextBuffer 

|a This function sets up the proper registers to initiate a DMA transfer 
•f§ from RDRAM address to the DP command buffer. 

Video Interface (VI) Functions 

• osCreateViManager 

This function creates and starts the VI manager (VIM) system thread. 

• osViGetStatus 
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This function returns the value of the video^terfafeitatus; Register. The 
include file rcp.h contains bit patterns that can'BtTised to interpret the 
device status. ^:0:-'-' 

osViGetCurrentLine Sfi'WMi, 

This function returns the curr|§|t half ^p5^s-|^: v 

osViGerCurrentMode ' : '' h 0\W^ 

This function returns the current VI m^de,;ty||E 

osViGetCurrentFrame^ll^r 

This function returns .01 cutifMiy displaying frame buffer. 

osVIGetNextFramebuffer ^•:-|% ; 

This function . : re|urns the next frame buffer to be displayed. 

osViGetGurrent|i|y^ 

This function reSrns tni| : urrent field (either or 1) bemg access by VI 
manager. ■ 

osViSetMode 4%.g;# : '' 

P-fhis function sets the VI mode to one of the possible 28 modes. The 
ne|| mode takes effect at the next vertical retrace interrupt. 

osViSetEvent 

This ful||tion registers a message queue with the VI manager to receive 
the notifilption of a vertical retrace interrupt. 

osViSet[X/Y]Scale 

.These two functions allow you to change the horizontal scale-up factor 
J|x~scale) and vertical scale-up factor (y-scale), respectively 

ipsViSetSpecialFeatures 

This function enables/ disables various special mode bits in the control 
register. 

osViSwapBuffer 

This function registers the frame buffer with the VI manager to be 
displayed at the next vertical retrace interrupt. 
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Audio Interface (Al) Functions 

• osAiGetStatus -xrfSy' 

This function simply returns. ;^%yalue of the audio interface status 
register. The include file tap0 contains bit patterns that can be used to 
interpret the device status^ 

• osAiGetLength 

This function simply returns the ntirnher. : ^f l bytes remained in the audio 
interlace DMA length register. *^a* 8 ' 

• osAiSetFrequency ;: .p : "^ 

This function configures the :; a|icUo interface to support the requested 
frequency (in Hz). It calculates necessary values to program internal 
divisors anei returns the closest frequency that the divisors can 

generjjpr^fg^ 

• osAi|etNextS^fefc';:|:i;f : : 

This function program the next DMA transfer based on the input 
length and starjmgjfpifer address. 

Peripheral Interface (PI) Functions 

• ^IsCreatePiManager 

ThiS; function creates and starts the PI manager (PIM) system thread. 

• osPiGetSiatus 

This function simply returns the value of the hardware status register. 
The include file rcp.h contains bit patterns that can be used to interpret 
j|. the peripheral status (that is, DMA busy and IO busy). 

||| osPiRawStartDma 

1? ; This low-level function sets up the proper registers to initiate a DMA 
transfer between ROM and RDRAM. 

• osPiRaw[Read/ Write] Io 

These two low-level functions perform an IO (32-bit) read /write 
from/ to ROM address space, respectively. 

• osPi[Read/Write]Io 



NU6-06-0030-001G of October 21, 1996 111 



NINTENDO 64 PROGRAMMING MANUAL DRAFT 



These two functions perform IO (32-bit) rea|fwrite&m/tt>;ROM 
address space, respectively. Since they providVrrrurual exclusion for 
accessing the PI device, these routines are both blocke§l.i® calls. 

• osPiStartDma ,:0Mi^ 

This function generates an asyg^ono^illQ^irequest to the PI manager 
to initiate a DMA transfer between RDRAM and ROM address space. 
Upon I/O completion, PI managed -'notifies the$p questor by returning 
the I/O request message to the message queue specified by the 
requestor, -vkIIpP-'' 

Controller Functions #" ^ 

• osContlnit 

This functio||j|^tializes all the game controllers and returns a bit 
pattern tppdicatii which game controllers are connected. 

• os ContK||et Jp' ;;< 

This functifen resets all gape controllers and returns their joysticks to 
neutral position, ^, v::} S0 : 

• JiifC/ontStartQuery 

lfp- 5 TKi&function issues a query command to all game controllers to obtain 
their/status and type. 

• osContGetQuery 

This fundlin returns the game controllers' status and type. 

• osContStartReadData 

c *>f&This function issues a read data command to all game controllers to 
"/obtain their input settings. 

• jpsContGetReadData 

$® s " xhis function returns the game controllers' joystick data and button 
settings. 
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Chapter 9 \ 

Basic Memory Management 



Introduction °° nf 

This chaplef"'" "' 5:;: #: 

• desclifees me|pfa!W€|e :i and software features of the Nintendo 64 
platforrh that' relate to memory management, and 

• discusses how an amplication may use them for efficient, correct 
.;■. memory utilization and access. 

'Thl||pftware interface of the Nintendo 64 platform allows you to take 
advantage of the hardware capabilities of the machine, which include high 
flexibilfty and high performance. However, with this flexibility comes a 
corresponding decrease in ease of programming, which this chapter 

addresses:*:*' 



Hardware Overview 

Recall that the primary processing elements of the machine are the MIPS 
114300 CPU and the Reality Coprocessor (RCP). The CPU executes 

application code directly from the DRAM, transparently caching instruction 
and data references in on-chip caches. The code itself makes references to 
CPU virtual addresses, which are translated by on-chip hardware to 
physical memory addresses. 
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The RCF is primarily composed of two elementsKl|^; : ;; : |i|nal Processor (SP) 
and the Display Processor (DP). The SP is a microcode^ engingthat 
processes task lists for audio and graphics. The DP is, for ; ifilrinost part, 
driven by the SP The RCP can be treated as a single processor for the 
purposes of memory management^ 

Finally, a number of DMA engines also .access DRAM directly: the DP, as 
well as the Audio Interface (AI), Serial Tntlxf ace (Sip- and Parallel Interface 

(PI)- #k %>■/■'■' 

At the hardware level, all of -these agents make references to physical DRAM 
addresses. These physical aSdreSSfeare derived in very different ways, 
however. '' : -':Mfe.:-.. 



CPU Addressing 

CPU virtual afjekess^anslatif-^ takes place in either of two ways: either via 
direct mapping; or through t|iprranslation lookaside buffer (TLB). When 
running in kernel mode, ^applications do on the Nintendo 64 platform) the 
address ranges have the behavior described in Table 9-1. 

llj|le 9-1 32 Bit Kernel Mode Addressing 



Beginning 


Ending 


Name 


Behavior 


oxooooalpp 


0x7fffffff 


KUSEG 


TLB mapped 


0x80000000 


0x9rffffff 


KSEG0 


Direct mapped, cached 


OxaOOOOOOO 


Oxbfffffff 


KSEG1 


Direct. mapped, uncached 


pxcOOOOOOO 


Oxdfffffff 


KSSEG 


TLB mapped 


OxeOOOOOOO 


Oxffffffff 


KSEG3 


TLB mapped 



The KSEG0 address space is expected to be the most popular, if not only, 
address space used. In this address space, the physical memory locations 
corresponding to be KSEG0 address can be determined by stripping off the 
upper three bits of the virtual address. For example, virtual address 
0x80000000 corresponds to physical address 0x0000000, and so on. 



114 



NINTENDO DRAFT BASIC MEMORY MANAGEMENT 



SPAddressing 

The SP microcode makes address references also, but these references are 
only to the local memory (IMEM and. DMEM) on the chip. With the current 
software architecture, the application elf |s not program the SP directly, and 
need not concern itself with IMiM andr6pi|*|accesses. 

DRAM references, however, concern 1 tlte application, because large data 
structures stored in DRAM are passed:by--ret|rihce. These include matrices, 
vertex lists, textures, ancTthe display lists #e ; mselves. As for the CPU, the 
addresses given to be SI*J||; |hese data objects are also virtual addresses, but 
the mapping from virtual top^srcal address is significantly different. The 
SP microcode maintains 16 local ofts: in DMEM that act as segment base 
registers. An "SP virtual" address is presented to the SP microcode in the 
form of a <segmerit number, segment offset> pair encoded into a 32-bit 
word. To corhpitl a physical DRAM address, the microcode adds the 
contents of the corlesg^ndmg segment base register to the given offset. 

DMA Engine Addressing 

A$;jndicated above, the 1 Nintendo 64 includes DMA engines that access 
.DRAM directly. Since these DMA operations are initiated by the CPU, the 
l€)Rj||/l addresses passed to the interface routines are CPU virtual addresses. 
Theserputines perform the mapping from virtual to physical addresses and 
give rnl||esulting physical DRAM address to be appropriate hardware 
registers! 

Makerom and Memory Management 

ita addition to its more obvious role of creating the application ROM image, 
mjSkerom (IP) is a powerful tool for both memory and symbol table 
ripnagement. Segments to makerom mean more than SP addressable 

plemory regions. To makerom, a segment is any contiguous, coherent region 

' of bytes in memory or on the ROM. 

The ROM specification file given to makerom provides virtual or segment 
addresses to segments. A segment consisting of MIPS 4300 code or data to 
run on the CPU can be given a virtual address with an address statement. 
A segment consisting of static display list data is given a segment address by 
specifying the segment number with a number statement. 
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Briefly, makerom does the following: '^$M^mSly' 

• scans the input specification file for syntax errors; 

• sizes the segments, creating absohit^syrnbols for segment addresses 
and ROM locations; '" ! -<|| 

• performs final relocations of rMocatablls trial %bmprrse the segment, 
using a link editor that can lm£ ; an.lt|itrary nuptber of segments to 
different addresses; 

• extracts the text and initialized data portions for each segment from the 
resulting fully linked binary,. and packs these portions of the segment 
onto the ROM image. |Jr ^l|^ 



Mixing CPU j n$j SP Addresses 

It is permissible to link segments given a CPU virtual address with those 
given a SP segment alldress^jt. may appear counter-intuitive and 
error-prone to ink relocatable of entirely incompatible address spaces. As 
it turns out, the benefits .oujweigh the potential risks, because it allows the 
application code to aHifilf '3P display list data symbolically. 

For example, suppose a segment is composed of the following display list 
|lata: 

static Vp x $|?,. - { 

SCRbIn_™*2, SCREEN_HT*2, G_MAXZ/2, 0,/* scale */ 
SCREEN_WD*2, SCREEN_HT*2, G_MAXZ/2, 0,/* translate */ 



} 



jpspinit_dl[] = { 
psSPViewport(&vp) , 
j|sSPClearGeometryMode(Qxffffffff ) , 

'gsSPSeuGeometryMode{G_SHADE | G_SHADING_SMOOTH) , 
gsSPEndDisplayList { ) , 



The beginning of the display list rspini t_dl is embedded somewhere in the 
segment. Rather than computing its offset into the segment, the display list 
is simply provided symbolically: 
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g S PD 1 s p 1 ayL 1 s t ( g 1 1 s tp + + , rspini t_dl ) ; : , ; - 

The compiler and linker do the \%00"M /computing the address of 
rspini t_dl within the segment. Thus, if the relative location of the display 
list rspini c_dl changes, the cdkp wi^J01 ! r^M%m valid (and more 
readable). Note that the CPU d6is^0 ; treference;i^ny of the data in this 
display list; the CPU just passes a reference to t& display list data to the SP 

A more complicated example involves usingfhe mixed symbol table to work 
with memory regions created by the CPU and read by the SP In this case, a 
single SP segment referf' ? to tv^ftifferent underlying DRAM regions. This 
technique can be useful when stall||isplay lists need to refer to dynamic 
data that is double buffered. The actual DRAM location currently being 
pointed to %s;wa|>ped by setting the appropriate SP segment register. 

The actua|;inemory : ;f|5#:;t||e:; : dynarnic data can be declared and created within 
a KSEGO edde segment ai'follows: 

typedef struct ..{i|| v .; rf -^;Hp : ' 

WE x projection; 

\^4 Mtx modeling; 

Gfx glist{204 8] ; 

}\;-Pynamic_C; 

Dyn2p:ic_t dynamicBuf f er [2] ; 

Dynani|||_t *dynainic Pointer = ScdynamicBuf f er [0] ; 



pie segment contents can then be modified by the CPU directly: 



. ;: :|x ; guOrtho (&dynamicp~>projection, 

-SCREEN_WD/ 2.0, SCREEN_TO / 2 . , 
SCREEN_HT/2.0, SCR£EN_HT/2 . , 1, 10, 1.0); 
guRotate (&dynainicp->modeling, theta, 0.0, 0.0, 1.0); 

The SP view of the dynamic segment is created by creating a relocatable with 
the following parallel definition and assigned to, for example, segment 
register 4 in the ROM specification file: 
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Dynamic_t rspdynamic; 

Since the relocatable contains only uninitialized data (bssj^fllt tual bits on 
the ROM are used. But more importaritly / ; ;: the symbol rspdynamic is made 
available to other objects. Its value-, :i|- ; ffie'"sejpient address of the dynamic 
segment. r||. 

The SP segment register 4 is then mappe<| ;; to the adipal memory for the 
dynamic segment with the following command: 

gSegment (glistp + -n, 4 , "%^£irtualToPhysical (dynamicp} ; 

Then the SP addresses of the dynaftiC;:^ true ture can be used, even from static 
display lists, to build display lists that reference components of the dynamic 
section: 



g s S PMa t r i x { & dyjt5;pnii;ffi; : ||5.jr ojection, 

G^MfX_P^Q^CTIc||:!:G_MTX_LOAD j G_MTX_NOPUSH) 

gsSPMacrix (kdynamig^ffio deling, 

Mm. G MTX_M0DEL^Sw|G„MTX_LOAD|G_MTX_NOPUSH) ; 



0Ls witnifhe previous example, using the compiler and linker to generate 
■addresses allows the data structures to be modified, reordered, and so on, 
without changes to unaffected areas of the application. 



Flushing the CPU Data Cache 

Th|:%lIPS R4300 CPU transparently caches data accesses on a onboard data 
cad||! Ordinarily this cache is of no concern to the application, but when an 
exjtflhal agent such as the SP or DMA engine is involved, the application 
;:Suist be aware of the caching implications. 

The data cache implements a "write back" replacement policy which means 
that data stores are held in the cache until the entire cache line is written 
back, usually due to a cache miss thatrequrres the same cache line. The cache 
is not coherent with respect to physical memory and thus cache lines must 
be explicitly written back to memory prior to their use by another processor 
such as the SP. 
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Using the above example, the dynamic data G^^b^^itten^ith a single 
procedure call as follows. It is expected that this 1 will be done prior to the 
task list being executed by the SP. %;&'"'■ 



osWritebackDCache(dyna|nicp, S:*'2 : e©-|:;|fynamic_t) ) ; 



Clearing uninitialized data (Bss|$ection 

Prior to loading a segment into memory, the application must invalidate the 
corresponding cache Hips. Tne^ikeromdP) makes appropriate symbols 
available to the application that can be used to construct the arguments to 
the osInvalDCache(3P) routines. Thenfhe actual DMA from ROM to DRAM 
may be performed, as well as the clearing of the uninitialized data (bss) 
section of the segment. It is important that the clearing be performed before 
the Bss section canlfoe^fed. Again, makerom(lP) generated symbols may be 
used for tie;&2erc§call. H§|g is some sample code that illustrates the process: 

extern char _newSegipritRomStart [ ] , _newSegrnentRomEnd[ ] ; 

§Xtern Thar _newS'egfment Start [ ] ; 

:? : ;<®|;ern char _newSegmentDataStart [ ] , _newSegmentDataEnd[ ] ; 
jext'fern char _newSegraentBssStart [ ] , _newSegmentBssEnd[] ; 

os^nvalDCache (_jiewSegmentDataStart, 

1 €|h % newSegmentDacaEnd-_plainSegmentDataStart) ; 
osPiSfeartDma (&dmaIOMessageBuf , OS_MESG_PRI_NORMAL, OS_READ, 
(€32 )_newSegmentRomStart , _newSegmentStart, 
(■u32)_newSegmentRomEnd - (u32)_new£egmentRomStart , 
ScdmaMessageQ) ; 

'1;H bzero (_newSegraentBssStart , 

_newSegrnentBssEnd-_newSegmentBssStart } ; ■■_ 

^|void)osRecvMesg(&dmaMessageQ, NULL, OS_MESG_BLOCK) ; 



Physical Memory Allocation 

The Nintendo 64 hardware contains four megabytes of "nine bit" DRAMS. 
The normally hidden ninth bit is used for antialiasing and z-buff ering 
hardware. It is recommended that the framebuffer and z-buffer reside on 
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different megabyte banks to take advantage of caching in the DRAM 

circuitry ,, v;v . ../M ; 

By default, the boot location resides a^ifr^eted mapped address 0x80000400. 
(or physical address 0x400). The fir|H024 : px400) bytes of physical memory 
are reserved for exception vector|t§rid con||pirif|>n parameters. This boot 
location can be changed by simply'rnserting an address statement in the boot 
segment of the makerom (IP) specification file. For example, the following 
code specifies the boot location to be at Ox|||gP0iO, which is the beginning 
of the third megabyte of memory. 

beginseg .y '^R. { % 

name "code" ''"-\M^ : ^ 

flags BOOT OBJECT 

entrY,..||OOt 

address 0x80200000 

st-fpk botf|:g.fcapk + STACKS I ZE 

i r0:l u d e f : £bc EWi : <| gco ent.o" 

indtode ft $ (ROOTl^usr/lib/PR/rspboot . o" 

include "$ (ROOT ; |lfusr/lib/PR/gspFas t3D. o" 

include vv $::(RQOjff Vusr/ lib/ PR/ gspFast3D. dram, o" 

include "$ W&&T) /usr/ lib/ PR/aspMain. o" 



ffhe boOt;process of the Nintendo 64 will copy one megabyte of data 
begtnning'With the boot segment specified in the specification file to the boot 
location. ' ; ; 
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Chapter 10 :!i ||g;|/ 

Advanced Memory Management 



Introduction 

This chapter explores techniques and features that are not required in the 
simplest :^appHc|ti3l^s||t contains useful information and tricks that may 
be used irrlseTtaifiisituati^l, but it is not expected that all applications will 
use all the techniques desjfflbed here. 



Mixing CPU and SP Data 



In the previous chapter it was implied that CPU and SP data should be in 
separalf Segments as they are addressed differently This is not mandatory, 
however> : :as ; the addressing can be easily reconciled. Suppose the application 
defines a display list and includes it in a segment given a CPU addressable 
KSEGO address. The physical address of this display list can be easily 
.determined with the OS_K0_TO_PHYSICAL(3P) macro or the 
$t§yirtiialToPhysical(3P) routine. The resulting physical address corresponds 
t||an SP address with segment number if 0, and a segment offset equal to the 
jpysical address. This is because the encoding of the SP segment address is 
las follows: 



31 28 



24 







xxxx 


segID 


segment offset 
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If the application creates a mapping usmg segiri^|„Q|o4P'egin^ng physical 
address of 0x0, the SP can correctly access objects : M : #RAM when given a 
physicaladdress. 

This simplifies the situation some^jfet, "bug the SP microcode takes it a step 
further: Since the upper four bits jja segment ^gcess are not used, they are 
ignored. Thus an implicit mappihg;|s 1; 4Q# from a§KSEG0 address to a 
physical address, and no explicit cohveifipn need/he done by the 
application. %&-., .. 

To summarize, as long as f||||J|:,segrnent table mapping is done from 
segment number to offset 0, CPU KSEGO addresses can be interpreted 
correctly by the SP '' s ' ? 4i$>h 



Using Overlays 

The total application code size and data will probably be greater than what 
is actively being used at anyMoint in time. To conserve DRAM, applications 
may choose to only have ae|ive code and data resident. To facilitate this, the 
apjjl^ation can be partitioned into a number of segments, where some 
sejpi§nts share the same memory region during different phases of 
execution. Here is an excerpt from a specification file that contains a kernel 
'code segment that can call routines in either of two overlay segments, texture 
and plaifk^.- 
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beginseg ., 

name "kernel" ; 

flags BOOT OBJECT 
entry boot -^0- :: ''''' Ki Hii4, : 

stack booCStack 4^TACK^%E^P :l> 
include "kernel . 0=^ ..40''" 
include "$ (ROOT) /uM'^SliS'/ PR/rsg^oot . o" 
include tt $ (ROOT) /usr/lxp/PR/g_f|§ l ast3D. o' 

e n d s e g . Sa <: -44 <;; % W}?-- 

beginseg M'^V'm-.. 

name " p 1 a i ill? 

flags OBJECT 

after "kernel" " <s -i s " 

inc^u^e w p 1 a i n . o " 
endseg 

begins e|| .0f'i ''-" ''"'^Kfy, 

ri&ifte "15exture|pi 
flags OB JECT J(| 
after "kernel'" 
i nc 1 ude '* be : x cure.o" 
fsnflpeg 

begaJflivave 

nattte "overlay" 

incllide "kernel" 

i n c 1 \0<& "plain" 

include "texture" 
endwave 



8§3te the use of the after keyword to place both of the overlay segments at the 

slime address. 



Prior to loading a segment into memory, the application must invalidate the 
corresponding instruction and data cache lines. The makerom(lP) makes 
appropriate symbols available to the application that can be used to 

construct the arguments to the osInvalICache(3P) and osInvalDCache(3P) 
routines. Then the actual DMA from ROM to DRAM may be performed, as 
well as the clearing of the uninitialized data (bss) section of the segment. 
Again, makerom(lP) generated symbols may be used for the bzeroO call. After 
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the segment is loaded, any procedure in the seg^gnt.rn^y be called or any 
data in the segment referenced. Here is some sample" ; ct>de that illustrates the 
entire process: -&M:^'' 

extern char jlainSegmentRonip-ar^6f||, _plainSegmentRomEnd [ ] ; 

extern char _p I a i n S e gme n t S t;fj§ t [ ] ; M 

extern char _plainSegmentTex||f tar^M'', _p : Mi n S egment Text End [ ] ; 

extern char ^plainSegmentData'titarS'T ] , _p||p.nSegmentDataEnd[ ] ; 

extern char _plainSegmentBssStart|jj|) , _p^inSegmentBssEnd [ ] ; 

osInvallCache (_pla|g|SegmentText Start, 

_plainSegTnentT;eptSad-_plainSegmentTextStar t ) ; 
osInvalDCache (_plap ? hSegini|| : ^DataStart , 

_plainSegmentDataEnd-^§|«a..nSegmentDacaStart) ; 
osPiStartDma(£tdmaIOMessageBulf OS_MESG_PRI_NORMAL , OS__READ , 
(u3 2 'l^-lainSegmentRomStart ,. _plainSegment Start, 
(u32>j£pftWitiSegmentRomEnd - {u3 2 )_plainSegmentRomStart , 
&d|vake s s at^Qifc;;:::;.::.^ 

bzero ( Jp;;|-ainS'egmentds : sStart , 

_plainSegmentj|p : isEnd-_plainSegmentBssStart) ; 
(vo;Ld)osRecvMesg (&dsp|i§%sageQ, NULL, OS_MESG_ELOCK) ,- 



Itfeing Multiple Waves 

The previlfus example linked both overlays into a single, fully relocated 
binary. Ihiilinary is used for two purposes. First, the text and data sections 
are extractecTfrom this binary and packed on the ROM. Second, this binary 
can be given to the Nintendo 64 debugger, gvd(lP). Although the 
ispecification file above will create an operationally correct ROM image, the 
ilpary will confuse the debugger. This is because multiple symbols will map 
to tr|e same address, and gvd may err when it tries to find the correct source 
Urjfror a given program counter value, for example. 

'This problem can be circumvented by creating multiple binaries, or waves, 
each with a distinct symbol table. The following specification file excerpt 
illustrates this: 

beginwave 

name ,, plain_wave" 
include "kernel" 

include "plain" 
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endwave 

beginwave 

name "texture_wave" 
include "kernel" 
include "texture" J 

endwave 1 



Using this technique, procedure and vatftfelejrieimes from the plain segment 
are kept distinct from those of the texture segment. The "Switch Executable" 
menu entry from the gvaW^^mn 1 ' menu can be used to select the symbol to 
' use while debugging. V '^ : %|: ;•:,<. 



There is one significant caveat when using multiple waves. The contents of 
each segment musthe identical in each of the waves the segment is included 
in. For exiffiple, thf Intersegment above is mcluded in both -plain jvave and 
texture _wali£j i so ltSffelocategl image must be identical in both. The usual 
consequence of this rule isjjpat the segment procedure entry point in both of 
the overlay segments rn^t be at the same location. This requirement can be 
elf jjy met by ensuring that the segment procedure is always the first 
i;gro|edure of the first relocatable that comprises the overlay segment. Then 
fee Cal li ng segment code can always jump to the beginning address of the 
overlap segment(s) and execute valid code there. 



Using the Region Allocation Routines 

Previous examples were primarily concerned with static memory allocation; 
Ihany applications may find it necessary to do some form of dynamic 

af|ocation. For situations where the allocation is always done in fixed size 

cjfpiriks, a family of region allocation routines are provided. These routines 
£wlll carve up a larger buffer into fixed some memory regions that are 

managed by the library. The routines of interest are: 

• osCreateRegion 

This function initializes an allocation arena given a memory address, 
size, and alignment. 

• osMalloc 



NU6-06-0030-001G of October 21, 1996 



125 



NINTENDO 64 PROGRAMMING MANUAL DRAFT 



This function allocates and returns the adMe^^$^mgie\ilxed sized 
and properly aligned buffer from a given regiolfrThis function will fail 
and return NULL is there is no available free buffer in the region. 

• osFree ,M^ }1: 'vM, 

This routine returns a previoi||y aUofpfedsSltffer to the given region 
pool. ' : '&m^mf III 

• os GetRegionBuf Count 

This function retumsl|||| : total number of "Suffers in the region. 

• osGetRegionB uf Size J:; f 

This function returns the actuarlftffer size, after having been possibly 
padded to the given alignment. ' !: *' 



The followrhgcode sjm^lt^eates a region, allocates a buffer, and then frees 
it. 'II- W 

vo id * r eg i on ; J$? 

char regionMgrrJHry [REGION_SIZE] ; 
u6 4 * buffer; 

5|| :t region = osCreateRegion ( regionMemory , 
%| 3 sizeof (regionMemory) , 

'Ijll^ BUFFER_SIZE, OS_RG_ALIGN_16B) ; 

buffer = osMalloc (region) ; 

/* do some work that uses v buffer' */ 

' ; M:k.', osFree (region, buffer); 

Incidentally, if the fixed size regions are intended to hold entire segments, 
\hqjjiaxsize keyword of the makerom specification file may be of interest. See 
: M$&rom(lP) for details. 



Managing the Translation Lookaside Buffer 

Although most applications will find the direct mapped KSEGO address 
space of the CPU sufficient, it is possible to use the mapped address space 
by setting appropriate Translation Lookaside Buffer (TLB) entries. 
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Perhaps the biggest restriction with using thST^B : ,-OS.il|i:at individual entries 
operate only on relatively large, aligned memory regions (pages). 
Nevertheless, it may be helpful for memory protection or relocation of CPU 
addresses. In addition, TLBs can be used as yet another method to reconcile 
SP segment addresses with CPU addresses, since SP addresses fall within 
the range of the mapped CPU ifdress s|pcef :|% : 

The translation lookaside buffer (TLB);bf the RjSoO has 32 entries, each of 
which maps two physical pages. The TfEBisiiilly associative, which means 
each entry is essenhaU^independent — the ihclex number implies nothing 
about the mapping and any entry can hold any mapping. A number of page 
sizes are supported: 4 KB, 16 KB, 64 KB, 256 KB, 1MB, and 16MB. Each TLB 
entry may map a different page ' siz^The following routines are used to 
manage the TLB: ™ : '* 

• osMapTlfB 

This function se|s t i$|t§;,epntents of a single TLB entry to the given virtual 
ad dresi^. even; arid od^physical address, page size, and address space 
identifier. 

• osUnmapTLB J g-u^-P' 

ii|fl|piis function invalidates both the odd and even physical page 
W 'Ikappings of a given TLB entry. 

• osUnmapTLBALL 

This &nction invalidates all mappings in the TLB. This should be done 
by the application prior to using the TLB. 

• osSetTLBASID 

||.. This function sets the current address space identifier register. 

iflng the TLB requires some care. The following paragraphs describe some 
.problem areas. 

• Two TLB entries cannot map the same virtual address space. If this 
occurs, accesses to the address will cause a TLB refill exception. Any 
overlapping mapping creates this condition, even when a mapping 
with a smaller page size is a subset of another mapping with a larger 
page size: 

osMapTLB(0, 0S_PM_16K, (void *) 0x0 , OxaOOOO, -1, -1) ; 
osMapTLBd, OS_PM_4K, (void *) 0x2000 , QxbOOO, -1, -1); 
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Another case involves different TLB entries^each qf : phichj§hap 
different pages of an odd /even pair. The folldwffif mappihgs, which 
individually map an even and an odd physical page,:;5fi^;:cteate an 
overlap condition: 

o sMap TLB ( , S_PM_4K , ( vfjt d * ) G,|p W&lf^a 00 0,-1, - 1 ) ; 
o sMapTLB ( 1 , OS_PM_4K , ( voi|h' ^0x2 000, f jffi, QxbO 00, -1 ) ; 

Instead, the application should set a sir^l^fehtry with both mappings: 

o sMapTLB ( 1 , OS_PM_4§ :: ; ;i l;:|yi<? id * ) 0x2 000, OxaOOO, OxbOOO, -1); 

• The mapped addresses must be aligned to the page size. This applies to 
both the virtual and physical pages mapped. 

This implied tnaf|f one intends to map SP segment addresses via the 
TLB, th|. §P segi^|n : t :: :r|lp§t be loaded at a page-aligned address. 

• Multiple mappings of a cached address must be of the same "color." 
CPU caches are physically tagged, but virtually indexed, which 
Introduces a situation in which more than one cache line references the 

,;;;;; jsime physical memory locations. Avoid the problem by using the same 
Mt vf%ial address consistently for a particular physical address. 

I r If you cannot use the same virtual address, the mappings should all be 
the saihe color, where the "color" is defined as bits [14.. 6] of the 
instruction address (for instruction fetches) orbits [15 ..5] of the data 
address (for data accesses). 

iJInally, no support is provided for handling and recovering from TLB 
"Slgses. A TLB miss is an unrecoverable fault to the Nintendo 64 system. 

Mojp information about these topics can be found in the MIPS R4300 
: documentation. 
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NINTENDO ^___ DRAFT : ^.GMPHICS MICROCODE 

Chapter 11 ^llll^ 

Graphics Microcode ^m^ 



Graphics are rendered in Nintendo64 games by creating a graphics display 
list, and passing this display list to the RSP In order for the RSP to process 
this dispi$ff ! iist, Sift application, using system calls, loads graphics 
microcodtjTrus se^t!dh:; ; i|i|cusses the different microcode object files 
available tolapplidations. '^|| 

There are six basic versips of the graphics microcode, and each basic 

elision has up to three subtypes. The basic versions are know as, gspFast3D, 
gsp|pDNoN, gspLine3D, gspTurboSD, gspSuperSD, gspSprite2D. Each 
' basiii^ersion has a different set of graphics rendering features. Each subtype 
has tMsame set of graphics features, but varies according to how the RSP 
passes commands to the RDP. The three subtypes are regular, .dram and 
.fifo. The %ject files for the microcode are labeled, <basicType>.o / 
<basicTypei>.dram.o, and <basicType>.fifo.o. 



NU6-06-0030-001 G of October 21, 1996 131 



NINTENDO 64 PROGRAMMING MANUAL DRAFT 



Microcode Functionality 



gspFast3D 

gspFast3D microcode is the most |l§l-feature J aV6f- lfi : e microcode objects. It is 
also the microcode used in the majotityl^fhe demcpp plications. gspFast3D 
supports 3D triangles, 3D clipping, z-buf|$ring, njp and far clipping, 
lighting, mip-mapped textures, perspectiv%J|sj<tpres, fog, and matrix stack 
operations. It does not sup|fo^the GBI command, gSPLine3D. 



gspF3DNoN 

The gspF3DNoM:;lsi|crocode is similar to the gspFast3D microcode, except it 
does not han|ffe neaiplarie clipping in the same manor. When using the 
gspFast3D rr||:rocod|j: : ioS]§Gl^ between the eye and the near plane are 
clipped. Whiff tisingfthe gsplfffDNoN microcode, objects between the eye 
and the near plane are not dipped. However, the area between the eye and 
the near clipping plane; :d$& hot implement zbuffering. This means that 
o^fls th at fall into this area must be drawn in order from far to near. 



? gspLine3D 

gspLine3d microcode features many of the features of gspFast3D, except 
instead of drawing triangles, it draws 3D lines. This is useful for producing 
wireframe effects. If a gSPlTriangle command is encountered it will draw 
Ipp^e. three edges of the triangle, but not the center portion of the triangle. 



gspTurbo3D 

gspTurbo3D microcode is a reduced-feature, reduced-precision, microcode 
that delivers significantly faster performance. The features not supported by 
gspTurbo3D are: Clipping, lighting, perspective-corrected textures, and 
matrix stack operations. The quality of the anti-aliasing also suffers, due to 
the lack of precision used by gspTurbo3D. This loss of precision can also 
manifest itself as various visual artifacts, depending on the content. 
gspTurbo3D uses a different format for the display list. 
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gspSprite2D 



gspSprite2D microcode is optimized for drawing 2D sf>i pi images. Sprites 
are implemented as textured screen rectangles. gspSprite2D does not 
support 3D lines 3D triangles, vertices operations, matrix operations, 
lighting, or fog. AH of the DP (Ipmman^lisd^as blender modes, and color 
combiner modes are supported^:^u ; fJ|ring cait&fe used to arrange the order 
of the sprites from front to back " "'•'•'" 



gspSuper3D 

gspSuper3D is a reduced precision microcode that supports the same 
display list format as gspFast3D. Thifreduced precision will increase 
perfoimance /f ;fept. s can cause visual artifacts. Although gspSuper3D uses the 
same display lis feis gspFast3D, gsp Super 3D does not support perspective 

corrected ; ':textnaresM'Jl#:H;;-: v 
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RSP to RDP command passing 



AH types of RSP microcode generate commands for the M§|lMne method 
used to pass the commands from the Jp?|? : -to the RDP determines the suffix 
used to name the microcode object. In me "regular" method the commands 
are written to a buffer in dmem, #|uch ca^;Koii«;||^ to six RDP commands. 
If the buffer fills, the next time mefcg,tr||i to wril|;a command it will stall 
until there is space m the buffer. Microe||je versioifs that use this type of 
command passing have no special suffix, J%si : ^'fe<#- appended to their name. 

Alternatively the RSP can ^rfte j: all the commands to a larger fifo buffer m 
rdram. This helps to prevent the"S|pfi"om stalling when the RDP gets bound 
by processing large triangles. Microli§|g : ,that uses this method has the 
".fifo.o" suffix appended to its name. "*"' 

When usmg.tfe rrfeyersion of a microcode, the application must pass a 
pointer to a^uffer tojfef gsjgd,as the fifo buffer, in the task output_buff field. 
The size of trie' fifo buffer is p&,t in the output_buff_size field. In order for fifo 
to have a positive effect on performance the size of the buffer should be 
greater than IK. 

T$0 microcode also provides another option for the RSP to write all of the 
;;jiSP cotrimands to an rdram buffer. In this case the application must start the 
IRDP task separately with a call to osDpSetNextBuf f er ( ) . (This form of 
comman1||g)assing is very useful for debugging in conjunction with the tool 
dlprint which, can print display lists in a human readable form.) Microcode 
designed to life this method has the ".dram.o" suffix appended to its name. 

Jasks using the .dram microcode need a pointer to a buffer in the 
s : oi|tput_buff field of the task structure, and a size in the output_buff_size. 
Because RSP commands usually expand when converted into RDP 
co iipriands, this buffer needs to be larger than the size of the RSP display list. 
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Chapter 12 

RSP Graphics Prograrxgning 



This document describes the grapfflfis state machine of the RCP, with a 
particular focuson the RSP (see "RSP: Reality Signal Processor" on page 44). 

The RSlf is an R^MifeCPU with an 8-element vector unit, featuring a 
small mteuchon s memoi?|j|IMEM (4K bytes or IK instructions) and small 
data memory, DMEM (4J§jbytes). Software running on this processor 
implements a large, portion of the geometry display pipeline. 

In addition, the RSP provides visibility for all of the RCF functionality, 
thUugh a variety of software conventions and hardware exposure. All 
"diSplay lists" for the RCP graphics features must pass through the RSP. 
There; are several important features which require the application 
pro grander to be consciously aware of the distinctions between the RSP 
and the Up? (and program each of them separately), but for the most part, 
the RSP serves as the single interface between the application program and 
the graphics pipeline: 

^Figure 12-1 Nintendo 64 Graphics Pipeline 



R4300 

game processing 

animation 

GBI assembly 




RSP 

3D geometry 

transformation + 
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RDP 

polygon 
rasterization + 

texturing 
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Topics covered in this document include: 

• RSP overview 

• display list processing 

• matrix state 

• vertex state "wl: ; .... .J^ 

• vertex lighting state 

• texture state a^ 

• clipping and cuUlfr§||:::: ;; , 

• primitives ''' : ' : S:fc.,. 

• controlling the RDP state 
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RSP Overview 



A program which runs on the RSP is called a task; the amplication is 
completely responsible for scheduling: and invoking tasks on the RSP. 

The interface between the application afti 'trie-SSP task is accomplished with 
a series of operating system calls, and a structure; called the task list (or task 
header) which is type OS Task (defmed;;in sptaskit). The task list contains all 
the information necessary to begin task execution, including pointers to the 
microcode to run. This|$tructure is filled fin 'by the application program. 

A detailed description of invocation of a task on the RSP is beyond the scope 
of this section (see "RCP Task Management" on page 65), but the essential 
procedure is straightforward: 

• the RSP is assumed to be halted (or the R4300 halts it). 

• H R430S DMA's the boot microcode into the RSP IMEM. 

• the:R430t) DMA'ffie 'task header' into the RSP DMEM. 

• the R4300 sets tKeRSP PC to 0. 

4f:g;. ; • the R4300 clears the RSP halt status (allowing it to run). 

Frorh.this point, the boot microcode takes over, loading the task microcode 
(and data) specified in the task list, and jumping to the beginning of the task. 

One itemit^the task header is a pointer to the initial data to process (in the 
case of a graphics task, this is a display list pointer). 

Display List Format 

Jjprie display list which the gspFast3D, gspF3DNoN, or gspLine3D microcode 
^running on the RCP interprets is defined as a stream of 64-bit commands. 

Applications written in C will usually use the interface from the file gblh., 
which will be included via inclusion of ultra64.h. Although the construction 
of display lists looks like a familiar series of function calls, they are actually 
just bit-packing macros. These macros are described in detail in their 
individual man pages. 
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Each macro has two forms, i.e. gSPTextureO and [ gi01?x{iire(). ife difference 
between 'g and 'gs', is that the 'g form is an m-lme^6 a rm whr*frl requires an 
additional argument (pointer of the display list being coifllpKted). The 
display list pointer must be of the iQi#a^ptr++", in order for the macros to 
work properly 

The 'gs' form is for static declaratio|%,afidf generals the appropriate C 
structure initialization sequence. ■-— w ^ 

Throughout this document-only the 'gs' forrM mentioned, however the 'g 
form also applies, and could always be substituted. 

All of the display list building macf$§j|tj|0 embed an 'SP' or a 'DP' to 
describe the functional unit of the RCFwnich will operate on this command. 
This is certamlv^qrrfusing, especially to application programmers familiar 
with higher^veTpaphics API's such as OpenGL. In order to achieve 
maximum pirformar%e^;|t: is necessary to expose the two major units of the 
RCP to the a|?:pncatiori programmer. The primary reason for this is resource 
constraints; trfire is simply not enough RSP IMEM to build a display list 
processor that is rich enougfi'to hide these details from the application 
pro||garnmer. In addiftolt^pven the dedicated application of the RCP (video 
sained), any CPU cycles spent "gift-wrapping" the graphics API are a waste 
(0 tiirilv The binary encoding of most of the display list commands is the 
ffowest ! §||>ssible level: they are the bits that control the hardware. 

Exposing tle ; two functional units of the RCP also limits the amount of state 
shared between them. The major drawback of this design decision is that 
you must often tell the same thing to the RSP and the RDP. For example, in 
ss prder to "turn on texture mapping" you must turn it on in the RSP and turn 
I||^n in the RDP. This may seem clumsy at first, and indeed this is a common 
sdiike of display list bugs, but the parallel execution of the RSP and RDP, 
phpfthe lean display list processing machine make this trade-off 
wiith while. 



Segmented Memory and the RSP Memory Map 

All DRAM addresses in the display list are segmented addresses. The 
mapping of segments and their base addresses is provided using the 
gsSPSegment ( ) macro. It is the responsibility of the application to maintain 
this mapping and inform the RSP via the display list. 
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The RSP maintains an associative table of up^O>|§^gment IP's and their 
base addresses. Any DRAM address in the display list, is : Jpjt^sical-ized' 
using this table. 

The RDP only uses physical add|esses / ; ^hd,one of the chores of the RSP is to 
do the address translation necessary fQ|"fe i '-Rfi| r 

Note: By convention, segment table erifery is jjierved for physical 
addressing, and should be set to 0x0. '^ l %:iMp' ; 

The RSP software can only access DMEM. All data must hrst be transferred 
into DMEM using DMlf opellij|nj, which must be 64-bit aligned. 
Invocation of the DMA engine is handled by the RSP software, but the 
application programmer needs to be aware of the boundary requirements. 
Any data structure that is to be passed to the RSP must be aligned to a 64-bit 
boundary. The ste^tctures mgbi.h use C unions to guarantee this. 

Since the DMA engine is shared between the R4300 and the RSP, the 
application program shout! also avoid unnecessary DMA activity while the 

RSP is running. : .,|&. y : ;SS : 



1nt%action Between the RSP and R4300 Memory Caching 

The most, prevalent example of communication between the CPU and the 
RSP is tlj||of the CPU creating a display list in DRAM for eventual 
interpretation by the RSP. The display list data is read from DRAM via a 
DMA mechanism. Unfortunately, DRAM locations may be "stale" with 
respect to newer data being held in the R4300's data cache. The R4300 cache 
mechanism implements a "write-back" caching policy which means 
individual stores to memory are not immediately written to memory. To 
Jlldate the memory contents with more recent cached data, the CPU must 
prst write back cached data to the DRAM. Then, and only then, will the RSP 
be able to DMA the correct data for display list processing. 

Conversely, the contents of memory may be more recent than cached data in 
some situations when the RSP modifies memory (an obvious example is 
updating the color frame buffer). In this case, the CPU's cache may contain 
stale data and the CPU should invalidate the cached data to force an access 
directly to DRAM and get the most recent data. 
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As a practical note, this second scenario only arises. m advanced 
applications. " a " ^ v .^ 
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Display List Processing 



Understanding the basics of the RSP. display list processing is necessary to 
construct efficient, compact dispia^ligts, for an application. 

The display list (or corrunandlist) can:h"e 'mdufpat of as a hierarchical 
structure, up to 10 levels deep. Hr$Jsjjay list n||y contain a pointer to 
another display list, and so on. The R^proc^jles the display list using a 
stack, pushing and popping the current display list pointer. 

For animation, it will beSesl^ble to "double-buffer" parts of the display list; 
rendering one frame while the-^Jafor the next frame is updated. In this 
case, only the minimum amount Sfctjata need be duplicated; only the data 
which will change for each frame. Swapping between doubled buffers is 
efficiently. :dohi;;|jy changing the segment base addresses (and organizing 
your display 1M appropriately). 

During computation by lib RSP, all display lists and their data must remain 
in the same location untiithe RSP is finished. This sounds obvious, but is a 
very common bug,:usu:iKy the result of incorrect usage of double-buffering 
|l|hruques. In addition, if the RSP task is interrupted (see "Signal Processor 
i' : (S|| Functions" on page 109), all of the data must remain in the same 
loctfaon when /if the task is restarted 



Connecting Display Lists 

Hierarchical display list connection can be made with the gsSPDisplayListO 
macro. The current display list location is pushed on the display list stack 
Hfid processing begins with the new display list 

liable 12-1 gsSPDisplayUst(Gfx *dl) 



Parameter 



Values 



dl 



pointer to the display list to attach. 
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Branching DispiayLists 

A display list branch without a push allows you to "cha JilltSgether 
fragments of display lists for more e|^ier|t memory utilization. 

Table 12-2 gsSPBranchList(Gfx *dl) |f $0 ;: ?'' ■ 



Parameter 



Values 



dl 



pointer to the display list to attach. 



Ending Display Lists 

All display lists must terminate with an "end" command. 
Table 12-3 gs^r?EndDi|JiayU^t(void) 



Parameter 



Values 



none 



none 



A Few Words about Optimal Display Lists 

The display list processor running on the RSP caches display list commands 
in groups 6£|t>out 32. This means the optimal display list size is a multiple 
of 32. A dispfijf list of 33 commands (or 65, etc) would require the display 
list cache to be refilled during processing, possibly causing a wait state 
depending on the DMA engine activity). Obviously not all display lists can 
: "IiS? p : the list processor running 100% optimally, but it is something to keep 
in iflpd when tuning your application. 

Another form of display lists which cause less than optimal processing are 
j fusplay lists that look like this: 
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Since the display list engine is stack-based, a'itgj^ayftt thathas lots of 
unnecessary indirect pointers will cause lots of ulmecessaj||§Bushes and 
pops, which do have a cost. -%m^ 

Constructs like this are unavoidable soi^tetimes, Hke when sharing 
geometries among objects, but if you have a choice try not to group indirect 
display list pointers together. 'W>0W0 
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Matrix State 



The "geometry engine" in the RSP implements a fixed-point matrix engine 
with the following matrix state: 

A 10-deep modeling matrix stack. New matrices can be loaded onto the 
stack, multiplied with the top of me;si|e|^ : poppedJ:pf of the stack, etc. This 

matrix stack is primarily used for manip|i|ating objects within the world 
coordinate system (often combinations of : r^^;|pis, translations, and 
sometimes scales). L 'i§k« 

A 1-deep projection and viewmg'n^trix "stack". New matrices can be 
loaded onto the stack, multiplied witntfhe top of the stack, but cannot be 
pushed or popped. This matrix "stack" is primarily used for the projection 
matrix and thev^WSig matrix. The projection matrix (often created with the 
guPerspectiwor meJ§uC>rtho functions) is loaded onto the stack, and then 
the viewingliHatrix (cfrl^rt created with the guLookAt function) is multiplied 
on top of it. ! l§|r *' 

A "perspective nonrialization" factor. This is used to improve precision of 
theclixed-point perspective computation. 

.;|ynen Ijgroup of vertices is loaded, they are first transformed by the matrix 
IMP (me ^current top of the modeling stack multiplied by the projection 

matrix). A||. vertex transformations are done only when they are loaded; 

sending a new matrix down later will not change any points already in the 

points buffer;"'''' 

.|Ehe modeling matrix stack resides in DRAM. It is the application's 
responsibility to allocate enough memory for this stack and provide a 
poiSsler to this stack area in the task list. 

The format of a matrix is a bit unusual. It is optimized for the RSP's vector 
"unit (used during the multiplies and transformations.) This format groups 
all of the integer parts of the elements, followed by all of the fractional parts 
of the elements. This unusual format is not exposed to the user, unless 
he/she chooses not to use the matrix utilities in the libraries. 
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Insert a Matrix 

Inserts a new matrix into the display list. •*< 

Table 1 2-4 gsSPMatrix(Mtx *m, unsigned ir|§ p)... 8j . 



Parameter Values 



m pointer to the new matrix. ,j|f 

p G_MX|LMODELVIEWor:G^MtX_PROJECTION / 

G JrfTX§|&JL or G_MTX_LOAD / 

G_MT^yPlJsS;;:b4G„MTX„NOPUSH 



Pop a Matrix 

This con|hhand pj|p%i^iriatrix stack. 
Table 12-5 gsSPPopMafrixjJfpsigned int n) 



Parameter Values 



unused 



Perspective Normalization 

This scale value is used to scale the transformed w coordinate down, prior to 
dividing out w to compute the screen coordinates (which are similarly 
^scaled). The effect of this is to maximize the precision of this divide. 
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The library function guPerspechveO returns one Ip|r,oxmT:afion for this scale 
value, which is a good estimate for most cases: 

Figure 12-2 Perspective Normalization Calculation v-aK*"- f 

near plane jM'W^Shf plane 




\r&tt?- : 0fQJil 



(represented as an unsigned 16-bit fraction) 



This aplibximatio^poriftaj-izes w=1.0 halfway between the near 

and far pT%ies. i:ig(- : '' "''x^-.i. 



Table 12-6 gsSPPerspNormaj|ze(unsigned short int s) 



Parameter 



Values 



16-bit unsigned fractional perspective normalization scale. 



Note on Coordinate Systems and Big Numbers 

JQje RSP is a fixed point machine, so keeping coordinate systems within a 
"'fJellgjn range is important. If numbers in the final coordinate system (or 
interflediate coordinate systems) are too big, then the geometry of objects 
can J|| distorted, textures can shift erratically, and clipping can fail to work 
correctly. In order to avoid these problems keep the following notes in mind; 

1) No coordinate componant (x, y, z, or w) should ever be greater than 
32767.0 or less than -32767.0 

2) The difference between any 2 vertices of a triangle should not have 
any componants greater than 32767.0 
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3) The sum of the difference of w's of any 2 vertices plus, the sum of the 
differneces of any of the x, y, or z componantsshould.be less than 
32767.0. In other words for any 2 vertices in a tria^gleT 
vl=(xl,yl,zl/wl), and v2=(x2^y2;,z2,w2) , these should all be true: 

abs (xl -x2 ) + abs (w£02V ! *Wi32 7 67.0 
ab s ( y 1 -y 2 ) + abs ( vlj-w2 ) <^mM%^ 
ab s ( z 1 - z 2 ) + abs ( wigw2 l.^W^ 2 7 '0'M 



One way to check thisgs to take the lafg^sf :¥Srtices that you have and run 
them throught the largt§||patricesyou are likely to have, then check to make 
sure that these conditior^'ije^met. 

A reccommended way of avo id mg trouble is to never allow any componant 
to get larger than 16383.0 or smaller than -16383.0. To ensure this find: 

M = fee 'largest componant (x, y, or z) of the largest model in your 

database. ;S§|Sfe~: 

S = irte'iargest scale (ie number m the upper 3 rows of the matrix) m 
the matrix made up .©J;the concatenation of the largest modeling matrix, 
^ the largest Lool^t Matrix, and the largest Perspective matrix you will 
use. 

|T = the largest translation (ie number in the 4th row of the matrix) in the 

1||atrix made up of the concatenation of the largest modeling matrix, the 

largest LookAt matrix, and the largest Perspective matrix you will use. 

Now M *;#+ T < 16383.0 should be true. If you experience textures 
wobbling or shifting over a surface, clipping not working correctly, or 
geometry behaving erratically, this is a good place to check. 



A Few Words About Matrix Precision 

The RSP uses fixed-point 32-bit multiplies during matrix operations. Since 
the product of two 32-bit numbers is a 64-bit number, only the middle 32 bits 
of the answer is retained. Overflow of intermediate terms is possible, 
especially in large coordinate systems or unusual projection matrices. 

In order to avoid fixed-point precision problems, in some cases it may be 
desirable to compute the matrix in floating point on the R4300 and just load 
it. 
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Matrix multiplies are very last on the RSP, but they are not free If possible, 
reduce matrix operations by p re-multiplying the matrices at modeling time 
or compile time. 
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Vertex State 



The RSP state includes a vertex buffer, holding up to ltf Irtices. This buffer 
can be loaded with any number o;ficoris|.eutive vertices, beginning at any 

location. SM SS^U^Kv. 



Table 1 2-7 gsSPVertex(Vtx *v, unsipSfll 



isigrifi'intvO) 



Parameter 

v 
n 
vO 



Values 



pointer typist of vertices. 

numberJP veftii^. , 

vertex buffer locatibtt tojoad vertices into. 



At the time tS^#J*ices are loaded, they are transformed by the current 
matrix staff and possibly shaded by the current lighting state. 

Vertices arlnot re-transformed again, if the matrix state changes, the old 
(previously-transformedjfertices are not affected. This feature can be 
exploited to construct data that is knit together between two groups of 
,||||its with different transformations (such as an elbow joint of a character). 

Sinc%he vertex processing is heavily vectorized and pipelined, it is 
important that each load loads as many vertices as possible. 

Since the vertex loading is a relatively slow operation, it is also important 
that any triangles that share vertices be rendered using the same vertex state, 
rather than re-loading these same vertices later. 

%ee the "Note on Coordinate Systems and Big Numbers" on page 146 for 
info on keeping your coordinates from becoming too big. 
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Texture State 



The following command sets the RSP texture state: 

Table 1 2-8 gsSPTexture(int s, int t, intlevels, in|;tile, : .int on) 
Parameter Values 

s s-coordinate texture scale (flfebit unsigned fraction) 

t f-coordina ; f$.s|exture scale (l^i^sirMgned fraction) 

levels (maximum ||^|ifir of rrup-map levels) - 1 

tile which tile iphefl|;FM 

on G ONorG OFF '"'§|^ 



As explained,grevio|xsly, a vertex's s and t coordinates are texel-space 
coordmates Irfa S10.Sfofma|. The texture coordinate usually ranges from 
to (texel_size- 1), pc#ioty 'larger to implement "wrapped" textures. The 
maximum nuliber of times tKat a texture may be wrapped is limited by the 
number of integer bits in th|fpbordinate. 

Sinsiiihe s and t coordinate texture scale parameters are only fractional 
$|f|nbi|p, they cannot represent values >= 1.0. For non-scaled textures, 

Applications typically use a vertex texture coordinate format of S9.6, and a 

■scale value of 0.5 (0x8000 in 16-bit unsigned format). 

The levels parameter tells the pipeline the maximum number of mipmap 
levels to use, if mip-mapping is enabled. 

||||^ tile parameter tells the pipeline which of the 8 possible tiles in the RCP 
texture memory to use when texturing the following primitives 

The on parameter rums texturing on or off in the RSP. If texturing is turned 
'■. off 'in the RSP, textured primitives will not be generated, regardless of the 
RDP state. 

Likewise, setting the RSP state is necessary, but not sufficient to generate 
textured primitives. The RDP state must also be set in the appropriate 
manner, see "TX: Texture Engine" on page 186. 
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Texturing is sensitive to large numbers and^grtlo^s: Refer to the 
Note on Coordinate Systems and Big Numbers "inline Matrix State 
section for notes on how to avoid texturing problems >$$£$■ as textures 
shifting across surfaces, textures tearing, and edges between polygons 
becoming visible in the texmrej? : 
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Clipping and Culling 



3D clipping is automatically enabled all the time. There are two modes 
which can be adjusted for performance^ §!Ki appearance: ClipRatio and 
NearClipprng. See also "Scissoring^oh p : a§3;184. 

3D clipping is expensive and shoufjbe. jy#c(ed. M^jhods employed by the 
host application which can reduce me'sffliajint of geiaghetry that gets clipped 
are a good idea. Crude visibility deteirnin^hpn algorithms, geometric 
lev el-of- detail, and careful? scene constmctibh-laWhelp improve clipping 
performance dramatically ]W'Mm» k 

The clipping algorithm is sensitive tSi||g| numbers and overflows. Refer to 
the Note on Coordinate Systems and Big Numbers in the Matrix State 

section for notes on how to avoid clipping problems. 

Clip Ratioi Ilflfefe,. 

The Clip Ratio feature helps tie application to clip less. 

GeneMy (ie when ClrpMibis set to FRUSTRATION) the RSP clips to the 
cBr||m^ :: frustrum which is defined by the projection and viewing matrices 
($iiin ci&ted using guPerspective and guLookAt respectively). This is the 
||fea whiff^is mapped by the gSPViewport command and usually 
corresponds; to the entire frame buffer. Objects outside this area are scissored 
by the RDP f 'i||ic : lipping them is not neccessary. The ClipRatio command can 
set the area which is clipped between 1 and 6 times the size of the viewing 
frustrum. Polygons which are completely on the screen are drawn without 
^dipping. Polygons which are partially onscreen but completely within the 
-eliirged frustrum are drawn without clipping (the extra portions are 
scisi|||ed away). Polygons which are entirely offscreen are trivially rejected 
(whel|er they are inside or outsid the frustrum). The only polygons which 
.are ; : flipped are the large polygons which stretch all the way from onscreen 
: ® : -Outside the enlarged clipping boundary There is some overhead for 
drawing sections of polygons which are then scissored away, but it is much 
smaller than the time to draw actual onscreen pixels and is usually faster 
than clipping. Different values of ClipRatio can be tried to obtain the best 
performance. High values of ClipRatio are suspected to be associated with 
"texture shuffle" bugs, so if you see the texture shuffling you could try lower 
values of ClipRatio. 
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To set the ClipRatio so that the clipping frasfetrn^ Jphe size of the screen; 

gsSPCnpRaho(FRUSTRATIO_3), #P'"" 

You can use values of F]^STR|fp^^ ~, FRUSTRATIO_6 

Near Clipping and gspF3E^M microcode 

3D clipping causes geometry which is a QU:tsi€te:!oi a 3D box called the 
"clipping Frustrum" toj&cjipped away (i£ not rendered). The left, right, top 
and bottom of this cUppjig; frustrum box correspond to the left, right, top, 
and bottom of the screfe How|yer the side facing towards the viewer and 
the side facing away from the viewer ?: cio not correspond to physical parts of 
the screen. The "far plane" is the side of the box farthest from the viewer. 
Objects whieMafe, farther away than this plane are not rendered. Likewise 
the "neaKpane v %the side of the box closest to the viewer. Objects which 
are closeffo the y||#el::%an this plane are not rendered. The near and far 
clipping f|bnes cln causlijisual problems. Objects which get too far away 
will suddenly dissappearjfs the cross the far clipping plane. Also, objects 
which get too close to .tfcifc viewer will suddenly dissappear as the cross the 
h'ear clipping plane?:* 8 **"' 

' Thejje is a solution to these problems. The near plane problem can be 
part&ly solved by using the gspF3DNoN microcode (which is an acronym 

for Falf :3D No Near clipping). The gspF3DNoN microcode will not clip 
objects Iftween the viewer and the near clipping plane (objects which 
would havl been clipped away by the gspFast3D microcode). However, Z 
buffering will not work correctly in this area. Objects between the viewer 
and the near plane will hide objects which are behind the near plane, but 
tgbjects between the viewer and the near plane will not correcly hide other 
: objects between the viewer and the near plane. For this reason it is 
j)p\portant for the application to ensure that only one object at a. time comes 
Slloser to the viewer than the near plane. 

There is a solution to the far plane problem too. Objects which get farther 
away from the viewer than the far plane visually "p°P" out of view, and 
objects approaching the viewer "pop" into view. The Fog effect can be used 
to make objects gradually fade into a distant fog, or slowly appear through 
a distant fog, instead of popping into and out of view. See the Vertex Fog 
State section for details. 
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Back-FacePolygonCuiling 

The geometry engine of the RSP implements a flexible polygon culling 
algorithm; either the front-facing, the^f cfc-facing, neither, or both types of 
polygons can be culled before rast^zati6r||| ^„, o ^ v 

This offers the programmer the moliia|^|ase flexibility. Geometry can be 
ordered in any direction or re-used with ilfferent j^gQing flags in order to 
achieve effects such as interior surfaces, 2^sJ$ecLp>tygons, etc.. 

Table 12-9 gsSPSetGeometryl^^jiinsigned intn) 



Parameter Values 



G_CULL_FRONT 
p||pLL_BACK 

g_cS|l,both 



Table 12-10 gs5PClearGeometryfiode(unsigned intn) 



Parameter valuest 



G_CULL_FRONT 
G_CULL_BACK 

G CULL BOTH 



Volume Ciiihg 

igje RCP can perform volume culling. The volume of an object is described 
:; tl|||e RCP and the RCP only draws the object if the described volume is 

entlpjly or partially onscreen. If the volume is entirely offscreen then the 

disjfjiy list is quickly skipped. 

! *ffte volume of an object is described with a number of vertices surrounding 
the object. The vertices may be part of the object or not. They can be 4 
vertices describing a pyramidal volume, 8 points describing a cube, or any 
other convex shape. These vertices should be sent to the RCP using a 
gSPVertex command just like regular vertices (note: you may want to turn 
lighting and fog off when these vertices are sent for better performance). 
Then the gsSPCullDisplayList command is sent. If the volume is entirely off 
the screen then the command acts like gsSPEndDisplayList and the rest of 



154 



NINTENDO 



DRAFT RSP..GRAPHICS PROGRAMMING 



the display list is skipped. Otherwise the coifiima^^acis as aHOOP and the 
display list processing continues. _ . 
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Vertex Lighting State 



The RCP graphics pipeline provides a number of sophistt£i|ea real-time 
Lighting effects, including ambient (unjform) Lighting, diffuse (directional) 
lights, specular highlights, and automatic 'texture coordinate generation (fog 
is discussed in its own section latel§f To acr^^'iStse effects and perform the 
Lighting operations, the foUowmg-'Sle|>SyrKtist be c|§|ied out: 

1) Reference the gspFast3D microdl)de, :: rn i :;;|fip ,V/ spec'' file. 

2) Replace colors wrtft -normal components in the vertices of objects to 
be rendered. 

3) Define light structures with thenar ameters of the directional and 
ambient. lights and send them to the RCR 

4) Mojjfy the ||atf ;;| : |the RCP to "turn on" lighting. 

5) Define a texture maf ;of the shape of the specular highlights to be 
used and describe th^fn to the RCR 

6) Define structures with the parameters of specular highlights and 
4ff' Ifend them to the RCR 

7) Render the objects. 

Steps 1), 2),1Sj^)/ and 7) are required for diffuse and ambient lighting. All 
steps are required for specular lighting. These steps are described in further 
..detail below. 



RSff Microcode 

'■fighting requires the gspFast3D or gspF3DNoN microcode. This microcode 
must be referenced in the "spec" file when the rom image is created. The part 
of the microcode that performs the lighting calculations is not normally 
resident, but is brought in through an overlay when lighting calls are made. 
This has performance implications for rendering scenes with some objects 
Lighted and others colored statically. Moreover, the lighting overlay 
overwrites the clipping microcode, so to achieve highest performance, it is 
best to minimize or avoid completely clipped objects in Lighted scenes. 
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Normal Vector Normalization 



To light an object, the vertices which make up the objeQfmust have normals 
instead of colors specified. The normal consists of 3 signed 8-bit numbers 
representing the x, y, and z components- of the normal. Each component 
ranges m value from -128 to +§§7. TheMeotkifonent goes in the position of 
the red color of the vertex, the y^i^o^e'^een/ajnd the z into the blue. Alpha 
remains unchanged. The normal vector must t> ^normalized. This means 
that square_root(x*x + y*y + z*z)== 12^;U: ! ;Tp. 1 .ndfmalize the normal (x,y,z) 
determme d=127/square_root(x ,( 'x + y*y : ¥l : *z). Then form XN=x*d; 
YN=y*d; ZN=z*d. Thefjfi||hzed normal vector is (XN,YN,ZN). (Note the 
libultra/gu square_root s furibtic|ii.is sqrtf().) 



Ambient and Directional Lighting 

Lightingipelps acfe^yplhe effect of depth by altering the way objects appear 
as they change trf&ir onefttation. The RSP microcode supports up to 7 
directionailights and 1 ambient light in a scene. Each directional light has a 
direction and a color. Axjbient lights have color only Regardless of the 
0||entation of the objlchand the viewer, each directional light will continue 
pi© : %rne in the same direction (relative to the "world") until the light 
Miration is changed. In addition, one ambient light provides uniform 
iilurriination. Shadows are not explicitly supported. 

Important note on Matrix Manipulation 

It is important, when lighting, that the projection matrix and the viewing 
i matrix (ie matrices which describe the view into the world coordinate 
j|ystem) be placed on the projection matrix stack(G_MTX_PROJECTTON), 

fliiile matrices used to describe the position and orientation of objects within 
Jjp world coordinate system are placed on the modeling matrix stack 
lp_MTX_MODELVIEW). 

Light Structure Definition 

Lighting information is passed to the RSP in light structures. Since the 
number of diffuse lights can vary from to 7, there are 8 macros used to 

define lights: gdSPDefLightsO, gdSPDefLightsl, gdSPDefLights2, ... , 
gdSPDefLightsZ The number which is the last character in the macro 
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signifies the number of diffuse lights in the sceitel -Qairjs'pondmgly. the 
number of diffuse lights to be rendered determines Which macro to use in 
defining the light structure. There is always one ambientSgpfe 

To define a light structure use gdSFj©efLights# where # is the number of 
diffuse lights to be turned on. Foijfcampl4|pf : :i flights: 

Lights3 light_struc turel = ?v g^i|pefLig|||fe3 { 

amb i en t_ r e d , amb i en t_gf||jsn , o .,,,0l$p i en t_b 1 u e , 

I i gh 1 1 r ed.||;l i gh 1 1 g r e en , v: ' ; ii;g;lt: l b 1 u e , 

t&ghtlx, lightly, lightlz, 

I I gh 1 2 r ed ,;%• igfyB^r een , 1 i gh 1 2b lu e , 

'^aign^ife|J.ight2y, light2z, 
light3red, light3g°f : e^;n:; f light3blue, 

light3x f l"ighc3y, light3z) ; 

will define a ; #ructu%: called light_structurel with an ambient light and 3 
directional lights. The:;#iflifejes with red, green, blue suffixes represent the 
color of the il&t ariillake on^alues rangmg from to 255. The variables 
with the x, y, zf suffixes represent the direction of the light and take on the 
range from -128 to +127. Thflight direction does not need to be normalized. 
The convention is that the light direction points toward the light. This means 
thi':lig|it direction indicates the direction TO the light and NOT the direction 
:;that the-iight is shining. Note the direction the light is shining is the negative 
; of the light direction. For example if the light is coming from the upper left 
of the wofjd, the direction might be x=-80, y=80, z=0. If this diffuse light is 
green, and the ambient light is red, this structure would be defined by: 
Lights £lly_light = gdSPDefLightsl ( 
/* ambient color red */ 
255, 0, 0, 
'&■$?;.;, /* green light from the upper left */ 

0, 255, 0, -80, 80, 0); 

To avoid any ambient light, make the ambient light black (0,0,0). To include 
only ambient light, and no diffuse directional light, use gdSPDefLightsO: 

LightsO my_ambient_only„light = gdSPDefLightsO ( 
/* blue ambient light */ 
0, 0, 255); 
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Note on Light Direction 

The light direction does not need to be normalized. However, there are some 
problems that can arise from usmg:ltg|tt :: directions with magnitudes that are 
too large or too small. The Ligh^directifh .^multiplied times the Modeiview 
Matrix (actually the transpose^ the rh^el matrix). If the Modeiview 
matrix has a scale associated wifc-ipjten the hg|t direction might overflow 
or underflow. If the Modeiview matrix has as|ale S associated with it and 
the magnitude of the light direction ii:|§thepf ou should ensure that 

1 < L*S < 23040j|#H;k, 

in order to keep the light workmg^bgsistantly. If L*S is too big then the 
normalization of the lights will overflow and you will get lights that are too 
bright. If L?S:4s ; too small then the nortmalization will underflow and you 
will get lights 'that are too dim. Note the number 23040 comes from the 
formulaj|L/128)Spsigg32768) because the result of the matrix multiply of 
L (which lis a s.7humber)tfius the / 128) times the matrix (thus S, the scale of 
the matrix, Which is an s||;16 matrix) must produce a number which can be 
squared (thus thesquargifoot) to produce a number which is s.15 (up to 

j|76S). -mm^ 

Lighting State Set Up 

To activate a set of lights in a display list use the macros: gsSPSetLightsO, 
gsSPSetLightsl, gsSPSetLights2, ... , gsSPSetLights7. For example, the 
following macros would activate the lights defined in the examples above 

gsSPSetLights3 (light_structurei) , or 
|s> gsSPSetLightsl (my_light) , or 
.§§■; gsSPSetLightsO (my_ainbient_only_light) , 

Ph. a static display list. (To activate the lights in a display list dynamically the 
corresponding gSPSetLights# macros would be used.) Once lights are 
activated, they will remain on until the next set of lights is activated. This 
implies that setting up a new structure of lights overwrites the old structure 
of lights in the RSP. 

To turn on the lighting computation so that the lights can take effect, the 
lighting mode bit needs to be turned on. This is accomplished using the 
macro: 
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gsSPSecGeomet ryMo de ( G_L I GHT ING ) 

Object Rendering 

Obiects are rendered by issuing g|f)metrid;::gpan||ive commands (see 
Primitives section). The objects dfifen^ijuse lighted colors instead of 
vertex colors. This means any color comDjrier modl||vill use lighted colors in 
the combination operation in a manner exactly anpbgous to vertex color use 
in non-lighted rendering ;^ote that hghtrrfj iiperforrned at Vertex 
processmg time. Therefor||| ^important that lighting state be established 
prior to gSP Vertex and gsJ£Ver%X: ;; cpmmands describing vertices in a lit 
primitive. Lighting state establisnlSHgtween a gSP Vertex command and a 
gSPlTriangle command will have no eSlct on that triangle. 

NOTE ON MAflMift PROPERTIES 

Material pr6|ertiesire not elghcitly supported. Instead material colors and 
light colors have been combined in the Light structure. To obtain the correct 
light color in a particular filiation, multiply the the color of the material 
tin|% ; the color of the Hghf foreach light source and use the result as the lights 
celo'r y;$rnce colors range from to 255, the result will have to be normalized 
jpy dividing by 255 in order to obtain a resulting light color in the to 255 
? ' range. In other words, if your material color is (mr, mg, mb) and your light 
is (lr,lg4D : |;then the light color you would use would be (mr*lr/255, 
mg*lg/25B|Jjb*lb/255). For example to light a purple object 
(color=255,0,S55) with yellow ambient light (color=255 / 255,0) and cyan 
directional light (color=0,255,255) you could use: 

i||j,; v Lights 1 maceriall_light = gdSPDefLightsK 

/* ambient color red = purple * yellow */ 
If 2 55, 0, 0, 

/* blue directional light - purple * cyan */ 
|p' : 0, 0, 255, -80, -80, 0); 

If you then want to change the material color (eg to light an object of 
different color) you can define a 2nd Light structure with different light 
colors but the same directions and send it to the RCP after the first object's 
vertices and before the second objects vertices. For example to light a second 
object which is yellow (color=255,255,0) with the same yellow and cyan light 
as above you could use: 
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Lights! material2_light = gdSPDefLifH:li£' ; ( 

/* anient color yellow = yellow J^fepLow »/ 

255, 255 , 0, i .^ Vi . j 

/* green directional. light = yellow * cyan */ 

0, 255, 0, -ff, -8Q|^Q1^ v 

PERFORMANCE NOTE: the gSSeMlghts# macros incur a certain 
overhead when they are called in ordel|p recalculate the new position of the 
light If the colors of the lights are being altered but the directions wilL 
remain the same you cafeise the gSPLigtl macro to send the new light 
structure after the 1st primitives vertex command and before the second 
primitive's. Note that tie dire%aal lights are always referred to as lights 
1-N (where N is the number of directional lights in the scene) and the 
ambient light is always referred to as fight N+l. For the example above, the 
entire sequehcl^ould look like: 

gsSiSttGeoraenryMod.e (G_LIGHTING) , 

gsSPi ; &;Ligi?:'ts3 (mai|riall_light) , 

gsSPVertexi /* de||ne vertices for object 1 */ ); 
/* render object^ here */ 
: #,, g S SPLight(&irtaterial2_light.l[0], LIGHT_1) , 
j£' L %,gsSPLight; { &material2__light .a, LIGHT_2 ) , 
iff' %sSPVertex{ /* define vertices for object 2 */ ) ; 
1j| render object 2 here */ 

Specu&|Highlights 

A specular highlight is the bright spot that shiny objects exhibit when the 
.0 ,.-^M viewing direction lines up properly with a highly directional light source.lt 
m, >r : : % caused by the light from the light source being directly reflected into the 
•eve of the observer. A specular highlight appears on a shmy object wherever 
^ 4fhe normal of the object bisects the angle between the direction ot the light 

^ .Mp'and the direction of the eye. The gspFast3D microcode can support zero, one, 

or two specular highlights on an object. If there are more than 2 lights in a 
■■■ " %, scene, a quite impressive specular highlight effect can still be achieved by 

♦ choosing the two most important lights and rendering the highlights from 

W- them. Specular highlights use texture mapping so specular highlights 

cannot usually be used with texture mapped surfaces. Specular highlighting 
llSts,, when combined with diffuse lighting (described above) can produce very 

realistic looking surfaces. While specular highlighting is not required to be 
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on when diffuse lighting is on, diffuse Ughtingi%5|^fo$;?0h whin specular 
lighting is on. However, the specular highlights do hot neccessanly have to 
correspond to the diffuse lights at all. 

A specular highlight is basically a jp^ecnOr|;of a light source. To render it on 
the RCP requires a texture map o|j|n unajefoitililrght. The specular 
highlight from most lights can be'rippseliied by ground dot with an 
exponential or gaussian function representing theiffitensity distribution. If 
the scene contains highlights from other, oddly shaped lights such as 
fluorescent tubes or glowiifeswords, the difficulty in rendering is no greater 
provided a texture map of.^^^ghlight can be obtained. The center of the 
image of the light should Jffin rSllpnter of the texture map and the texture 
map must be a power of 2 in widtR^Sp^ i( height. In general shinier objects 
reflect smaller, sharper highlights. A dull object might have a large white 
dot for a specular highlight whether it is lit by a glowing sphere or a flaming 
sword. A shai^rn&£llic object would reflect the sword as a picture of the 
sword and tip textur£;ma|>iused for highlighting different types of objects 
can portray'i|i|^ difference. W&te that many objects, such as human skin and 
cloth, which reflect specular Jpghlights to some extent, often can benefit 
more from a regular texhije|cnap (eg hair on the body or a pattern on the 
clothv Since these materials are not shiny the texture mapping ability may be 
be|p'r;ipent on a conventional textutre map. 

Specuiar Highlight Structure Definition 

Specular ligfiling information is passed to the RSP in structures, analogous 
to the diffuselfght case. The utility procedure guLookAtHilite fills in the 
elements of 2 structures, Hilite and LookAt, for use in highlighting. To 

igccomplish this, the two structures must be part of the dynamic segment 

■ ; i||{ared as 

Hilite hilite; 
,., si i;g!Lo okA t 1 ooka t ; 

and guLookAtHilite must be called for each object in the following manner: 

guLookAtHilite(&throw_away_matrix, fidookat, fithilite, 
Eyex, Eyey, Eye 2, 

Objectx, Objecty, Objectz, 
Upx, Upy, Upz, 

lightlx, lightly, lightlz. 
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1 i gh 1 2 x , li gh t ; 2||, . ,. 1 ±g0:2 z , " yy, 
t ex_w i dth , t e x_h e i gSSj-r : .,0} 

where the arguments in common with guLookAt have the same meaning. 
Objectx, Object}'; and Objectz are the world coordinates of the center of the 
object, light lx, lightly, and ligff lz are the direction of the light which is 
reflected in the 1st highlight (should beMe sambas the direction specified in 
the gdSPDefLights# macro). HgKtlxf lightly, an||light2z are the direction of 
the light which causes the second highlight (i|Sbu are only using one 
highlight these may be zero). tex_width and tex_height are the size of the 
texture to be used for tfte-fttghlight and must be powers of 2. 

The information in the LookA¥ ? ili||tcture is sent to the RSP with the LookAt 
macro: 

gsSPLp^^^i^&lookat ) , 

Texture Loading 

The texture for the Mghlljpits must be loaded with gsDPLoadTextureBlock 
or similar loadblockleorhmand. For example, the following call loads a 
• te^ width by tex_height 4-bit intensity texture: 

gsDPLoadTextureBlock_4b {hilight_texture, G_IM_FMT_I, 

tex_width, tex_height, 0, 
G_TX_WRAP | G_TX_NOMIRROR , 
G_TX_WRAP | G_TX_NOMIRROR, 
"W tex_width_power2 , 

tex_height_power2 , 
G_TX_NOLOD, G_TX_NOLOD) , 

Iflhere tex_width_power2, tex _height_power2 are the logarithms to the base 
I|of the texture width and height Note that wrapping must be turned on, 
|pnd the texture sizes must be a power of 2 for proper operation. The texture 
Roadblock macro sets a texture tile with the parameters necessary for 
rendering one texture, and thereby one of the specular highlights. Setting a 
second texture tile with the parameters for rendering a second specular 
highlight can be done by loading another texture, but generally the same 
texture can be used for both highlights. Instead, setting up a second tile if the 
specular highlights are sharing one texture map can be accomplished with a 
set tile call. The example following assumes the same 4 bit intensity texture 
as used for the first highlight: 
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gsDPSetTile{ G_IM__FMT_I , G_IM_S I Z_4b , ''' i: B^p'P ;||> 

( ( tex_width/2 ) +7 ) »3 ,^,, ,-;J; : F 
, G^TX_REHDERTILE+tf ^ 
G,J$$Sj&P | G_TX_NOMIRROR , 
t5|x_wid||^<DWer2 , G_TX_NOLOD, 
jIlTX JWR^l:^f-^|iTX_NOMI RROR , 
®fi^i^0lt_po^r2 , G_TX_NOLOD) , 

Texture Coordinate Transformations ^MiP s ' ! 

Specular highlighting utiles ^,prqec ti on of the vertex normals in the x 
and y directions m screen space tSfcrve the s and t indices respectively for 
referencing the texture. The normals 'fhiist be normalized as described 
above. The normal projections are scaled to obtain the actual s and t values 
for the reference. Trie scaling is applied in the RSP. It maps the negative most 
projection of a^uhit nifmal/ or -L into zero - It maps the positive most 
projection, otf-1, mtc^lieili rvalue passed in through the gsSPTexture 
command. SilfcoseWe max!jj|rrn texture s, t coordinates are tex_s_max and 
tex_t_max. The following command sets the scale, so that a normal project 
of +1 in the x direction in screen space will be mapped with the texel with s 
coc^^iate tex^s^max:^"'"™" 

.,,;!/ gsS|Texcure ( ( ;ex_s_max ) << 6 , ( tex_t_max ) <<6 , , 

W' G„TX_RENDERTILE, G_ON) , 

The left shift of argument by 6 bits is done to account for the SI 0.5 16-bit 
internal representation of the texture coordinates (see Texture State below) 
and a multiplication by one-half in the microcode. 

Highlight Position Description 

Aft|§the texture is loaded, the highlight position information must be sent 
Jf Ipe RSP. This information is contained in the Hilite structure, and is sent 
TO the RSP with the following macros: 

gsDPSetHili telTile (G_TX_RENDERTILE , &hilite , 
tex_width, tex_height) , 

gsDPSetHilite2Tile(G_TX_RENDERTILE+l,&hilite, 
tex_width, tex_height) , 

where both highlights share the same texture. 
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Lighting State Set Up Mfe:v- : ;#'' >v 

Specular highlighting requires the lighting and textur#S§lil§ration mode bits 
to be turned on using the macro ;^-g;| 

gsSPSetGeometryMode ( G_LIGHTING | ! G_TEXTURE_GEN) , 

Object Rendering 

As with diffuse lighting^ objects are rendered by issuing geometric primitive 
commands (see Primitives section). For two specular highlights, the 2 cycle 
mode can be used, witrf'a cy cle^eyo ted to each highlight. In addition, since 
each highlight can have a differeht : |plor, two registers are needed to hold the 
colors for combining. The Primitive <8blor register holds the first highlight's 
color and mefe^ironrnent register holds the second highlight's color. As an 
example, tlie'lolo^'ing calls: 

gsDp||:CCyc#Type'' J t#|CYC_2 CYCLE ) , 

gsDPS'ffEnvCo 1 or ( JJ2 55,255,255), / * cyan * / 

gsDPSecPrimColor.-gtf 0, 255, 255, 0, 255), /* yellow */ 

r .* : . gsDPSetRende£Hpde : "(G_RM_PASS, G_RM_AA_ZB_OPA_SURF2 ) , 
-0 ; : : gsDPSecCombineMode ( G_CC_HILITERGBA , G_CC_HILITERGBA2 ) , 

set u^, rendering of a cyan and an yellow highlight in opaque z-buffered 
antialiased mode. Note that for most materials the highlight color is the 
same ai;:the light's color, in contrast to the diffuse light case where the 
resultanf'iMpr is often affected by the color of the object it is striking 
(although metallic objects like gold and brass usually have material-colored 
highlights). 

(Reflection Mapping 

^Reflection mapping maps a texture onto an object using the normals of the 
object to specify where on the object the texture will be mapped. If this 
texture is an image of the surroundings of the object, then this rendering will 
make the object appear to reflect its surroundings. This effect simulates the 
rendering of objects made of chrome or having a highly reflecting, 
mirror-like surface. 
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Structure Definition 1|| 

As with diffuse and specular lighting, information for re flec#dn ; mapping is 
passed to the RSP in a structure. The utility procedure guLookAtReflect fills 
in the elements of a LookAt structuM : f6r ; i|se in reflection mapping. To 
accomplish this, the structure muscle part g|t}ie;::$vnainic segment, declared 

as ':|| :5 ,.. ._.,<# " ' l 1||. 

LookAt lookat; 
and guLookAtReflect mustli&eaUed for each object in the following manner: 

guLookAtRef lect^ < &tnrd'«rii|^ay_matrix, klookat , 
Eyex, "Sysy, Eyez, 

Objectx, Objecty, Objectz, 

„: ) -::|;':i; : ::i||s i , ; Upx , Upy, Upz ) ; 

where the ajj|umen1$:;iri:£^^ with guLookAt have the same meaning. 
Objectx, Objljly, anl^Objectlj&re the world coordinates of the center of the 
object. 

Thgf%>okAt structure contains information about the orientation of the 
objjf ct;.;£elative to the viewing direction. This information is sent to the RSP 
,:$pth th s £ : LookAt macro: 

gsS:BLookAc( klookat } 

Texture Loaatt|j 

The texture for reflection mapping must be loaded with a loadblock 
"corrimand such as gsDPLoadTextureBlock, described in the example above. 
As'm : .the specular highlighting case, wrapping must be turned on, and the 
tex||jre sizes must be a power of 2 for proper operation. 

texture Coordinate Transformations 

Reflection mapping utilizes the projection of the vertex normals in the x and 
y directions in screen space to derive the s and t indices respectively for 
referencing the texture. The normals must be normalized as described 
above. The normal projections are scaled to obtain the actual s and t values 
for the reference. The scaling is applied in the RSP. It maps the negative most 
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projection of a unit normal, or -1, into zero. B^ap^tHe positive most 
projection, or +1, into a scale value passed in tnlftl|K the gggPTexture 
command. Suppose the maximum texture s, t coordinates; ale tex_s_max and 
tex_t_max. The following command^sete the scale, so that a normal project 

of +1 m the x direction in screensp ace will be mapped with the texei with s 
coordinate tex_s_max: ||| M0MiSP»;: . 



gsSPTexture ( (tex_s_max }'< ^6 ,; 



It max) <k6J , 



G|pT_RE^fJ|iRTILE, G_ON) , 

The left shift of argumeaij^fc^its is done to account for the SI 0.5 16-bit 
internal representah^;eif : mU:tfe^re coordmates (see Texture State below) 
after a multiplication b^ one-hMlinthe microcode. 

The texture coordinate transformation depends on the geometry mode of the 
RSP, Two ir#etes#re supported, regular and linear. 

The hrst rriode (re|uT'ai5' : ;c|grives the texture coordinates from the x and y 
projection^alues, mulripflpi by the above mentioned scale. In this mode 
the S coordinate represents- the x componant in world coordinates of the 
direction from the objeclto the point which should be reflected. The T 
, coordinate represents the Y componant. This means that your texture map 
! should represent the.fpljjowrng mapping: 1) The center of the texture map is 
what is directly behi|g|%u. 2) The circle inscribed in the texture map 
boundaries is what is"cTirectly in front of you. 3) The circle with a radius of 
0.707 tilfes the radius of the circle in 2) is the objects directly to your left, 
right, up, ; i©wn,^etc. 4) other points map respectively. YtiW--k/\ 

The second mode (linear) derives the texture coordinates from the inverse 
ycosine of the x and y projection values, multiplied by the scale. In this mode 
tilie S coordinate is the angle of the direction of the reflected vector in the XZ 
fjane. The T coordinate is the angle of the direction in the YZ plane. This 
^tpbde is useful because you carouse a panoramic picture of the horizon for 
Ifour texture map. The center Ijg^he texture map should be the horizon 
directly behind you. The extremes of the texture map to the left and right 
should be the horizon in the direction which is directly in front of you. The 
top of the panoramic texture map should be a constant sky color, and the 
bottom a constant ground color. When the yaw of the viewing angle changes 
it is a simple matter to adjust the S position of the texture map so that the 
new "directly behind" position is the new center of the texture map. 
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Reflection mapping requires the lighting and te^ns generation; mode bits 
to be turned on. The first mode (regular) is set usmg' ; lne macrc& :" 

gsSPSetGeoraetryMode(G_LIGHT^G,. J G_TEXTUKE_GEN) , 

while the second mode (linear) isj|§t wim;K<;f^iv- 

gsSPSetGeometryMode (G_LlGl#^G'|'" G_TEXft?RE_GEN | 
G_TEXTUKE_GEM_LINEJlR) , 

Compatibility with Specu^ljgiahlighting 

Reflection mapping uses texture rf!i|§|||rig so it cannot be used with objects 
which are otherwise texture mapped. However, reflection mapping can be 
used in conjunction with one specular highlight. This is analogous to 
rendering twp;:;spieli|ar highlights, and utilizes the 2 cycle mode. The 
specular highlight texture, is set for a second tile and accessed in the second 
cycle. Alterf^^iveiy /; :|peculif ^highlights can be comb me d with reflection 
mapping by ir|corporating th||specular highlights (as bright dots) into the 
reflection map texture wher|fer the lights are located. This technique 
permits an unlimited hiiiftber of specular highlights. 

Environment Mapping 

Reflectiorii'mapping provides a simple means for carrying out environment 
mapping. The texture map needs to be an image of the environment as seen 
from the "viewpoint" of the reflecting object. The main difficulty with this 
procedure is, of course, generating a suitably realistic texture map. 

ilihe simple, yet effective, way to generate an environment map is to first 
render the scene as viewed by the object. Render all the objects in the scene 
usijtt a viewing matrix obtained from a guLookAt call where the Eyex, 
E% eyEyez is at the center of the object and Atx, Aty, Atz is at the eyepoint. 

itfender this scene into a 16 bit, 32 pixel x 32 pixel framebuffer which is not 
part of the main framebuffer. Then re-render the entire scene into the main 
framebuffer using the previously rendered 32x32 pixel texture 'map as an 
environment map for the reflective object. Larger texture maps can be used 
by playing with tiling. This is not a mathematically perfect way to generate 
an environment map. but it is relatively chfe|>, and very effective. Try using 
different aperature angles in the perspective call while rendering the texture 
map and turning G_TEXTURE_GEN_LINEAR on or off to tweak the effect. 
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Vertex Fog State 



Fog alters the color of objects based on their distance from the eye position. 
Fog can be used to make objects blend into the background color as they get 
farther away. One problem which can be fixed by fog is that when an object 
goes beyond the far clipping boundary and is clipped away it suddenly 
dissapears. If fog is enabled thel0^e ; s|-ean be iff§de to look more and more 
like the background color until, when the object reaches the far clipping 
plane, the object is exactly the same color as the background and no one 
no hce s when it diss ap £ ear s . 

The use of fog requiresBnat Me'; following steps be taken: 

1) run in two cycle mode. '*' 

2) Set me : rencfe| : mode to blend the fog color with the primitive color. 

3) Set the fog position. : 

4) Enable fog. _ .... 
IP? 5 l||Set the Fog Color. 

For example: 

/* "fe;eycle mode */ 

g s DP l||Cyc 1 eTyp e (G_CYC_2 CYCLE) , 

/* blend fog in AA ZB mode */ 

gsDPSetRenderMode{G_RM_FOG_SHADE_A, G_RM_AA_ZB_OPA_SURF2 ) , 
s. /* set fog position and enable fog */ 
l|| gsSPFogPosition(FOG_MIN, FOG_MAX) 
if-i gsSPSenGeoraetryMode (G_FOG) , 
M? /* set che fog color */ 
§I' J gsDPSetFogColor {RED, GREEN, BLUE, ALPHA), 

FOG_MIN specifies the position where fog begins and FOG_MAX 
represents where fog is thickest. Both values are integers and are mapped 
linearly such that 0={at the near clipping plane), and 1000={at the far 
clipping plane}. FOG_MAX is generally set to 1000 so that objects are 
completely "fogged out" when they hit the far plane, but not before then. 
FOG_MIN is set to the position where fog starts. A value of will make the 
object slowly change to fog color as it retreats from the viewer, while a larger 
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value (eg 800) will make the object clearly visiblguntil it jgts 80% of the way 
to the far plane where it will finally begin to "fo|i>|ftf SNIote that 
perspective makes distant objects look 'much* farther aw r ay.l|§h nearby 
objects. Because of this some objects which don't appear to be very far away 
may be more affected by fog than expend;, even though the FOG JvIIN 
value is fairly high. To remedy th^s^roblep.sirnply increase the FOG_MIN 
value until you get the desired effj||. For.:f *aihplf ; : ;tf you set FOG_MIN to 
500, but objects which are about mii#&§l>etween1tjie far and near planes 
look foggier than they should, just mcreas|§the val Jiof FOG_MIN until they 
look better. . fe . "'W^'ffW 

Fog works well when the bp^lkis a constant color (the same as the fog 
color). When the horizon 1 Color is complicated (eg clouds, gradient colors, 
etc), you can make objects become transparent when they are distant. To do 
this don't set the G__RM_FOG_SHADE_A render mode or the Fog color. Just 

enable fog, usesil^BSparent render mode, and swap FOG_MAX and 
FOG JvIIN. jjfc_Mik jjtguld be set to 1000 to make the object completely 
transparent rtsfhen it i|Ilft ; &r clippmg plane. FOG_MAX should be a large 
enough value$hat f#f'has ncfSffect until the object is farther away than any 
other objects are likely to be,|ie beyond mountains and other terrain, etc.). 
Because transparency: is^sf I, the z-buffer will not keep things behind the 
trartspaxent-fogged object from being hidden, so it should only be enabled 
f^ODJects which are already fairly far from the viewer. This special 
jfteri£pafent--fog mode should be used with caution (as compared with the 
"regular fog effect described in the preceding paragraphs which should work 
consistantly). 

Fog is indepelfaant of lighting and texture mapping so it may be used in 
conjunction with any, all, or none of these other effects. 
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Primitives 



Availability of different geometry primitives depends ofllie version of the 
RSP microcode which has been loaded for execution. 

Tr i a n g I e s JP"" ' !1 ^(j| 

Table 12-11 gsSP 1 Triangle (in t vO, kit vl, j|v2, intjsg) 



Parameter 



Values ; 



vO 
vl 
v2 
flag 



vertex r>un®|ridex of the first coordinate. (0-15) 
vertex buffer indek;df the second coordinate. (0-15) 
vertex buffer index of the third coordinate. (0-15) 
Msed for flat shading; ordinal id of the vertex parameter to use for 
Shading: 0, l,or2 



Other bits^f the flag nelcgfre currently reserved. 

ynes ! «ll|# :; ' 

! Table 12-12 gsSPLine3D(int vO, int vl, int flag) 



Parameter 



Values 



vO 
vl 

flag 



vertex buffer index of the first coordinate. (0-15) 
vertex buffer index of the second coordinate. (0-15) 
unused (should be 0) 



fynes are only available when running the line microcode. All the normal 

Jlfertex attributes (color, texture, z) are also available for lines. Lines however 
|lquire separate rdp rendermodes to be set than for polygons. Consult the 
^man pages for more details. Z-buffered lines will only do reads of the 

z-buffer, and not writes. Thus z-buffered lines should be drawn after 

z-buffered polygons. 

Rectangles 

All rectangles are 2D primitives, specified in screen-coordinates. They are 
not clipped, but they are scissored in a limited fashion. In 1CYCLE and 
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2CYCLE mode, rectangles are scissored in the slftie.w v aJ J ; : as triipigles. In 
COPY and FILL modes, rectangles are scissored to four pixel boundaries; 
meaning that additional scissoring may be necessary in lfl|i|p|)lication 
program. 

Filled rectangles are implemente4;;:|nhreIy:;;i^; :; :ti|;|lDF; as "pass-through" 
commands with respect to the RSKlThgpffe mentioned here for 
completeness: ■-5^-||_ |g; 

Table 1 2-1 3 gsDPFillRectars§§|4 unsigned rnt uix : ^n5 : igned irvt uly, unsigned mt Irx,. 

unsigned int li^p^t^, 

Parameter Values 

ulx screen coordinate of upper-left x (10.2 format) 

uly , ; ^;sp§en coordinate of upper-left y (10.2 format) 

lrx >-0 sere era; coordinate of lower-nght x (10.2 format) 

lry |:|.. screjpoontfriate of lower-right y (10.2 format) 

Textured rectangles recjui^rhinimal RSP intervention, and are thus an SP 
operation: ''^^M^ 

. Table 12-14 gsSPTextureRectangle(unsigned int uLx, unsigned int uly, unsigned int 
lrx, unsigned int lry, int tile, short int s, short int t, short int dsdx, short 
% .: & int dtdy) 

Parameter " f -:$h : Values 

ulx screen coordinate of upper-left x (10.2 format) 

»ujy screen coordinate of upper-left y (10.2 format) 

ir^fl;, screen coordinate of lower-right x (10.2 format) 

lry '¥0 screen coordinate of lower-right y (10.2 format) 

tilgfef which tile in TMEM to use 

!§;»"" s coordinate of upper-left corner (SI 0.5 format) 

t t coordinate of upper-left comer (S10.5 format) 

dsdx change in s per change in x coordinate (S5.10 format) 

dtdy change in t per change in y coordinate (S5.10 format) 

There is a related macro, gsSPTextureRectangleFlipO, that is identical to 
gsSPTextureRectangleO, except that the texture is flipped so that the s 
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coordinate changes m the y direction, and the t coordinate Changes in the x 
direction: *^-*&.w- 

Table 12-15 gsSPTextureRectangleFtip(pnsigned intulx, unsigned intuly, unsigned 
int lrx, unsigned int,lry, i;; iht tile,; short int s, short int t, short mt dtdx, 
short int dsdy) j; ; ;;;j' 



Parameter 



Values 



ulx 
uly 
lrx 
liy 
tile 
s 
t 

dtdx 
dsdy 



screen coordinate of up^^r4eti«^|-l0.2 format) 
screen Coordinate of upper-left y (10.2 format) 
screen c^iethmte of lower-right x (10.2 format) 
screen cdordinfll|j|Jower-nght y (10.2 format) 
which hie in TMEMlolase 
^s coordinate of upper-left comer (S10.5 format) 
-Ifrpordinate of upper-left comer (S10.5 format) 
Glahgilh t P er change in x coord mate (S5.10 format) 
change lrWjper change m y coordinate (S5.10 format} 
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Controlling the RDP State 



The RSP performs two functions to support programmiri^STkDP: 
segmented address fix-up and handling ^Mhermode. 

Segmented address fix-up. Since the RDP is a physical address machine, the 
RSP must translate the segmente^|tt<a^e|§es prese||± in the display list mto 
physical addresses for the RDR It does s'ql>y filtering out any RDP command 
with an address (the 'set image' corrm\ani||.an : d : |j:itchmg the address before 
passmg it to the RDP "-^s^- 

The RDP setothermode register iK|j|p©llechon of state bits, affectmg many 
different functions of the RDR In 6fl|||^) simplify programming the RDP 
state, the RSP caches the SETOTHERMODE command, and presents a 
simpler "set/clp||iynterface through the display list. See Chapter 13, "RDP 
Prograniming^ : l6r ? r|i(3re details of these macros. 
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Chapter 13 

RDP Programming 



The Reality Display Processor {RDpf^asterizes triangles and rectangles, and 
produces rugh-cjuality, Silicon Graphics style pixels that are textured, 

antiahasecganc||rbufiered. 

The RDlflias foufmlffii&nfigurations where all the individual blocks work 
together 1B[%enerlte pixefp These main configurations are called "cycle 
types," because they mdJSate how many pixels are generated per cycle. The 
following table indicates their peak performance. Keep in mind that these 
fl|fik numbers are typically realized on large rectangle primitives. Triangles 
1 have variable short and long spans and these numbers degrade rapidly. The 
following table lists the RDP's performance. 

Table 13-1 Cycle Types 



Type °W Performance 



FILL 4 16 bit pixels/cycle 

2 32 bit pixels/ cycle 

JJOPY 4 pixels/ cycle 

If CYCLE 1 pixel/ cycle 

2CYCLE 1 pixel/ 2 cycles 



Note: These are theoritical peak performances. In reality, due the memory 
latency and buffering overhead, actual performance numbers are lower. 
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RDP Pipeline Blocks 

The RSP performs 3D geometric transformations while t$§8i?P pipeline 
rasterizes the polygon. The RDP consist of several pipeline subblocks. There 
are six major logical RDP blocks: t|e RS, TX, TJj, CC, BL, and MI. The 
connections between these blocks|J|n be reconfigured to the four cycle types 
listed in Table 13-1, to perform dirf||e ; n|;|gsterizatt!r|n operations. 

Table 13-2Basic Operations of RDP Subbloc%|^ r . j: . < ,:,s;i^ 



Block Functional® 



RS The RaStefizer generates pixel coordinates and their attributes' 

siopes. Pixel coordinHllinsist of X and Y. Attributes consist of 
R, G, B, A, Z, S/W, T/Cf W, L, pixel coverage. 

XX ;| : 'Tlif ^eXruring unit contains texture memory and samples the 

II textu^based on which texel represents the pixel being 

III, progepid'i^llhe pipeline. 

TF f The Texture Filter performs a 4-to-l bilinear filter of 4 texel 

samples to produce a single bilinear filtered texel. 

Q0& The Color Combiner perforins general blending of color sources 

mII II by linearly interpolating between two colors with a coefficient. 

For example, it may take the filtered texel samples and the 
' : ' shading color (RGB A) and combine them together. 

BL . The B Lender blends the pipeline- processed pixels with the pixels 

•||? in the framebuffer. The blender can do transparencies and also 
sophisticated antialiasing operations. 

|MI The Memory Interface performs the actual read /modify /write 

'!'£%■.. cycles to and from the framebuffer. 



Note: The six RDP blocks (RS, TX, TR CC, BL, and MI) are purely logical 

Hfcks. For example, the hardware implementation of RS consist of several 
blocks. However, for programming, each can be treated as a single logical 
block. 
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One-Cycie- per- Pixel Mode 

The pipeline configuration illustrated in Figure 13-1 shows' how the RDP 
blocks are connected in one-cycle-^petspixel mode. 

Figure 1 3-1 One-Cycle Mode RD# Pipelinigorifguration 

Rasterizer '$f£% ■■ 'sSPer-Pixel ©rierators 



RS 



ili? B ^ 


S c 


1 


N 


mebuffe 


*pBlw 








■""■■V* ~ ec 


texture maps'" 










in dram 











Table 1 3-||DP Pipj||^;t|sck Functionality in One-Cycle Mode 

Block Functiona||| 

RS Generate^ pixel and its attribute covered by the interior of the 

primitive. 

?TX'f|. Generates 4 texels nearest to this pixel in a texture map. 

TF Bilinear filters 4 texels into 1 texel, 

OR performs step 1 of YUV-to-RGB conversion. 

CC Combines various colors into a single color, 

OR performs step 2 of YUV-to-RGB conversion. 

1$ L Blends the pixel with framebuffer memory pixel, 

OR fogs the pixel for writing to framebuffer. 

■jjSjjB Fetches and writes pixels from and to the framebuffer memory. 

One-cycle mode fills a fairly high-quality pixel. You can generate pixels that 
are perspectively corrected, bilinear filtered, modulate /decal textured, 
transparent, and z-buffered, at one-cycle-per-pixel peak bandwidth. 
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Note: Reaching peak bandwidth is difficult, Tiji^irieb^iifer hiemory is 
organised in row order. In small triangles, it is rafi#Kiave long.honzontal 
runs of pixels on a single scanline. In these cases, the pipelin^ejspften stalled, 
pending memory access for read or w.r||e„ cycles. 



Two-CycIes-per-Pixel Mod ef|,, Vj _ ^# ■ :: ' 

The RDP blocks can be reconfigured mtdii;twc-cy|le-per-pixel pipeline 
structure for additional furagionality. FigureMf ^llnows the RDP pipeline in 
2-cycle mode where one p!§S|jk generated every 2 clocks. 

Figure 13-2 Two Cycle ModtRDI* : ii;||l.me configuration 
Rasterizer "ftpPixel Operators 
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Table 13-4RDF Pipeline Block Functionality for Two-Cycle Mode 

|£Kj§pk Functionality 

RS Vftl Generates a pixel and its attribute covered by the interior of the 

primitive, 

lUftlf Generates 4 texels nearest to this pixel in a texture map. This can 

be level X of a mipmap. 

TX1 Generates 4 texels nearest to this pixel in a texture map. This can 

be level X+l of a mipmap. 

TFO Bilinear; filters 4 texels into 1 texel. 
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Table 13-4RDP Pipeline Block Functionality for Two^Cycl 



Block Functionality 



TF1 Bilinear; filters 4 tpels: : |nto 1 texel, 

OR step 1 of YlJ|§to-RGB ; >|onversion. 

CCO Combines varidi^^ipr|ijnto a single color, 

OR linear interpolates ffie 2 bilmeaf filtered texels from 2 

adjacent levels of a mip^^^^M' 

OR performs step 2 of YU^IPkGB conversion. 

CGI Combines vafi&us colors into a single color, 

OR chroma keying!; 

BLO Combines fog color with resultant CC1 color. 

BL1 , M^' i:: 'Bfends the pipeline pixels with framebuffer memory pixels. 

MIO R^df modify/write color memory 

Mil Read /modify/ write Z memory 

;;.##;p-cycles-per-pixe!mode contains more features than one-cycl- per-pixel 
imojle. In addition to all of the features of one-cycle mode, two-cycle mode 
can ^iso do mipmapping and fog. 

Note: rvJjO and Mil represent two cycles of the MI that access color and z 
framebuffer cycles, respectively. This is only a logical representation. The MI 
does not need to run two cycles to do color and z-buffer access. One cycle per 
pixel mode can also perform color and z-buffer accesses. The reason for this 
..representation is to show that two MI access cycles are balanced in the 
ij|yo-cycle mode. In one-cycle mode, the pipeline is often stalled at MI, 
Raiting for the framebuffer when accessing both color and z. 

f These RDP blocks are very flexible and can be configured to do many things. 
Table 13-4 outlines the typical usage of these blocks for a powerful 
rasterization pipeline. Study the following sections to understand what 
attribute state is programmable within each RDP block to master the raster 
subsystem. 
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Fill Mode 

For high-performance framebuffer clearing, the RDP has a;pphbde, which 
can fill 64 bits per clock. A programmable JIDP color attribute is written into 
the framebuffer during each 64-bit w|ite cycle. The RDP arithmetic pipeline 
is largely unused, because the com|;utahon : ;|^!h|;ti:keep up with the pixel 
fill rate. The fill mode is most comr^liil^^pd for cl|armg color and 
z -buffers. " w '|. 

Note: In fill mode, use the. render mode 

g*DPSetRenderMode(G JLV^NQOP, G_RM_NOOP2) to put the blender 

into a safe state. Attempting) riad : ^ when in fill mode can cause the RDP 

pipeline to hang. 



Copy Modegf"' {;?c "'i|. 

For Mgh-perffrmanc#rmage^|t5rimage copies, RDP also supports a copy 
mode that can 'Spy 64 bits or Jjlixels per clock. The RDP texture memory in 
the TX is just a buffer capbMiof holding up to 4 KB worth of image pixels. 
You |||t load bitmaps int ; o~fnis buffer as well as writing back out to the 
frameiliffer. The is a common bit blit operation that many 2D graphics 
hiirdwa|e systems support. Once agam, the RDP arithmetic pipeline is 
^largely unnsed in copy mode. 

Note: One iiftgortant operation that does work in copy mode is alpha 
compare. Thilljlows RDP to blit an image into the framebuffer and 
conditionally remove image pixels with alpha = 0. Usually, images with 
galpha = represent transparency, see "Alpha Compare Calculation" on 

;:: p3jp,,315 for more details. 

Not^|ln copy mode, use the render mode 
!i ,g*d|ietRenderMode(G_RM_NOOP, G_RM_NOOP2) to put the blender 

ill!) a safe state. 
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RDP Global State 



Severalstate are global to the RDP, usually to specify pipeline configuration 
and synchronization. 



Cycle Type 

To configure the pipeline for renderings &$0£ one of the cycle types that 
offers the functionality- : ||^iiured at peak performance. 

Table 13-5gsDPSetCyclefype(type) 



Parameter 



type 



Values 



h^cycicycle 
gj:yc„2Cycle 
g_cyc_copy 

g cycjtll 



Synchronization 

Youljjjght ask "How does the primitive rendering pipeline synchronize 

with all" of the different attribute states that the programmer can set?" 
Imagine feat the last few pixels are being processed in the RDP pipeline 
when it receives a new attribute command, and this command affects the 
pixel currently being processed. You would not want the last few pixels of a 
.„ primitive to have the attributes of a following primitive. You really want to 
jhave the attribute state only to modify the pixels of the primitive following 
fie attribute state change. This synchronization is not implicit within the 
Ipipeline; the application must explicitly insert proper synchronization 
between attribute state changes and primitives. 

Table 13-6gsDPPipeSync() 



Parameter 



Values 



none 



none 
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This command synchronizes the attribute upd^with.:r|spec%to primitive 
rendering. It ensures that the last pixels of a primitive are rendered prior to 
the attribute taking effect. Insert this inbetween an RDP primitive followed 
by an RDP attribute: , s .-«^ 

g D P S e t Cy c 1 e Typ e ( g 1 i s tp + + , Qg£ YC_F f ife; g; ; :: 
g D P F i 1 1 R e c t ang 1 e ( g 1 i s tp + + , -;;B ;,. , , jp?"V " ' ^2;|;) ; 
gDPPipeSync(glistp++); 
gDPSetCycleType(glistp++, G_CYC_$|.YCLE ) ; : ff 



Note: After a primitive (|p||pglriangle, gDPFillRectangle, 
gDPTextureRec tangle) anetbelor^an RDP attributes (eg. gDPSet*), you need 
to insert a gDPPipeSync. 

After processing aU. of the RDP display list, the host processor must be 
interrupted-- : a|id rie>| jiied . 

Table 13-7^?Full^cO 



Parameter Value 



noHil none 



! gDPFuilSyncO also shuts down the RDP until given a new DP DL to 
elimma1j|excessive power consumption. 



Span Buffer Coherency 

HlH RMW cycles, the RDP is smart enough to prefetch a row of pixels as soon 
aijhe X, Y coordinates of the span are determined. The RDP then preloads 
th#i:lramebuffer content of this span into an RDP onchip span buffer. The 
Rifr then waits for the pipeline to process the parameters for the outgoing 

fpxels. When the outgoing pixels are computed, they are "combined" with 
the preloaded framebuffer pixels before writing back to the framebuffer. 

An example of this operation is z-buffer and transparency blending. (This is 
not shown in the logical pipeline description earlier, to simplify the 
understanding of the pipeline.) 
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The RDP has enough onchip RAM to hold '£$tex$i£$$n buffers. Therefore, 
what would happen if two spans in sequence Happened to Overlap the same 
screen area? The RDP would prefetch the first span w±QM0p an buffer while 
the pipeline starts processing this span. Then it would prefetch the next span 
into another span buffer. 

This is where the problems ocfur;,#ie- ; |Jrxel dali%for the next span is not yet 
computed. The RDP does have span buffer co|f$rency, at the cost of some 
performance. If errors are objectionabie.m youf animation, use 
gsDPPipelineMode(G_PM_l PRIMITIVE) to cause all primitives to add 

between 30 to 40 null cypt^-- after the last span of a primitive is rendered. 

Table 1 3-8gsDPPipelineMode(mole|; v . . 



Parameter Vaiue 



mode ^•"■^%PMJ PRIMITIVE 
GjPM. NPRIMITIVE 



These dead cycles can b ^expensive in terms offillratesoitis recommended 

jtot to use the IPRlllMVE mode be used unless absolutely necessary. 
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RS: Rasterizer 



The Rasterizer 's main job is implied in its name: to generate:pi*eis that cover 
the interior of the primitive. The primitives are either triangles or rectangles. 
For each pixel, the RS generates the. following attributes: 

• screen x, y location ||| : : . : ff' "'" '"~* :§ ^f§, 

• z depth for z-buffer purposes %?; **" : '" || Jf 

• RGBA color information x VMm$' : *~ 

• s/w, t/ w, 1 / w, lod for t|l|4f«f index, perspective correction, and 
mipmapping. ,|l f **5|i :y . 

These are commonly referred to a's3s||t, w, 1. 

• coverage value... : . 

Pixels online ed|e; of primitives have partial coverage values. Interiors 
are full.pf-, -W^'^m.. 

These values are sent to mejjjf>elined blocks downstream for other 
computations, such as texture sampling, color blending, and so on. 

Figure .1 3-3 RS State and Input/Output 



TriiKgle or 
Rectangle 




w Stepped Pixels 
(xyzrgbastwl, cvg) 



Scissoring 

Scissoring is commonly used to eliminate running performance-intensive 
clipping code in the geometry processing stage of a graphics pipeline. You 
do this by projecting the clipping rectangle at the near plane larger than the 
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scissor rectangle. The rasterizer can then efficiently : el|rmnat£»:the portion 
outside of the screen rectangle. ""^S^" 

The RSP geometry processing is performed in fixed-point arithmetic. The 
clipped rectangle boundary is not a perfect rectangle, because of precision 
errors. This artifact can also be eliminated using the scissoring rectangle. 

Figure 13-4 ScjBSor/CHppmg/ScreeWR^otangles ||| 
| ,;:.;,.. clip p ing rec t @#tea# pfHrte 



:isSpr/ screen rect 



SkA 






\} 




Triangle A is scissored, but not clipped. B, C and E are trivially rejected 

becaustno pixels are enumerated. Only D is clipped and scissored. 

Table 13-9gsDPSetScissor(ulx, uly, lrx, lry) 



Parameter 



Value 



ulx 


upper left x 


uly 


upper left y 


lrx 


lower right x 


lry 


lower right y 



Note: Rectangles are scissored with some restrictions. In 1CYCLE and 
2CYCLE mode, rectangles are scissored the same as triangles. In FILL and 
COPY mode, rectangles are scissored to the nearest four pixel boundary; this 
might require rectangles to be scissored in screen space by the game 
software. 
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TX: Texture Engine 



The Texture Engine takes s/w, t/w, 1/w, and lod values for a pixel and 
fetches the onboard texture memor^&ffe four nearest texels to the screen 
pixel. The game application can mtfllpulalljTX. states such as texture image 
types and formats, how and whereto loac^fenS^iimages, and texture 
sampling attributes. ''WBi0M 

Figure 13-5 TX State and Input /Output 
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DRAM 



Text u resiles 

TX treats the^'KB on-chip texture memory (TMEM) as general-purpose 
texture memory. The texture memory is divided into four simultaneous 
^accessible banks, giving output of four texels per clock. 

Triijgame application can load varying-sized textures with different formats 
anjiyhere in the 4 KB texture map. There are eight texture tile descriptors 
5 .&|t s 'describe the location of texture images within the TMEM, the format of 
^fus texture, and the sampling parameters. Therefore, you can load many 
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texture maps in the TMEM at one time, but&ere are,©nly eight tiles that are 
accessible at any time. 

Figure 13-6 Tile Descriptors and TMEM 4f$:W 

tile SMM 



TMEM 



TMEM location 
size 

wrap /clamp /mirror state 
format 




8 htetotal 



tile 7 



TMEM location 

size 

wraj)/ clamp /mirror state 

for^lfilk.. 



Note: There are some resirictions, depending on texel size and 64-bit 
alignment within the ilfeX'ture memory. See "Alignment" on page 259. 



Multiple Tile Textures 

Given tfie-eight texture tiles, you can use two- cycle pipeline mode to cycle 
TX twicelfed access eight texels (four from each of two tiles). This 
functionality, coupled with the use of up to eight texture tiles, allows the TX 
to perform mipmapping and detailed textures. 

Iflirthermore, there are no explicit restrictions requiring power of two 
He-sized decrements for mipmaps. Multi-tile texture map sizes are all 
jsiridependently programmable. Therefore, using these tiles and the color 
''combiner block (see Chapter 13, "CO Color Combiner"), arithmetic logic 
can result in many special effects. For example, sliding two different 
frequency band tiles across a polygon surface while combining them with a 
blue polygon can give a nice ocean wave effect. 
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Te xt u re I m a ge Ty p e s a n d F o r m at 

Table 13-10 shows the legal combinations of data types and^pt/texel sizes 
for the Color and Texture images. ForJSgA types, the 16-bit format is 
5 /5 /5 / 1, and the 32-bit format is $|f ?§'f §|| 

The Intensity Alpha type (IA) repllfesj\#value : :i^ the RGB channels and 
places the A value on the A channel. The IA 16-bitformat is 8/8, the 8-bit 
format is 4/4, and the 4-bit format is 3/l-||^ :t .,^§^ 

Table 13-10Texture Format ari|'||izfs 



Type 


<fb 


^l||;,Sb 


16b 


32b 


RGBA 






X 


X 


YUV 






X 




Color Index f 


Y 


X 






IA 


xff 


X 


X 




I 


!•••;'■' 'M*'? 


X 







I Texture Loading 

Several stlf feare necessary to load a texture map into the TMEM. You must 
block-load tnfetexture map itself and set up the attributes for this tile. There 
are GBI macros that simplify all these steps into a single macro. 

Ifeere are two ways of loading textures: block or tile mode. Block mode 
assumes that the texture map is a contiguous block of texels that represents 
thJUhole texture map. Tile mode can lift a subrectangle out of a larger 
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image. The following tables list block and tilei:topde;;;^tuit--loading GBI 
commands respectively. 



Table 13-11gsDPLoadTextureTile(timg / ;fmt / siz, width, height, uls, ult, Irs, lrt, pal, 
cms, cmt, masks, mask^shifeyjshiftt) 

Table 13-12gsDPLoadTextureTile||b{pkt, ##'fete : width, height, uls, ult, Irs, lrt, 
pal, cms, cmt, maskl^askk'sfufts, sHlftt) 



Parameter 


Value 4 


timg 




Texture dfit£ti address. 


fmt 




G_IM_FMT_RGBA 

GJM_FMT_YUV' i - R S||o,^ 
GJM_FMTCI 

i : ; : ::; :gyM_FMT_I 
■ [ "WWi FMT I A 


siz 




GjMlSIZ_4b 
G_IM_SIZ_8§f 
GJMjsggpb 
GJMjBzL32b 


'wiqpi 


, height 


Texture tile width and height in tex 


pal 




TLUT palette. 


cms, c 




clamping/mirroring for s/t axis 

G_TX_NOMIRROR 

G_TX_MIRROR 

G_TX_WRAP 

G_TX_CLAMP 



J|asks, maskt Bit mask for wrapping. 

W G_TX_NOMASK or a number: A wrapping bit mask is represented 

by (l«n umber) - 1. 
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Table 13-11gsDPLoadTextureTile(hmg, fint, siz, wiSp|height|ilis, ulfelrs, lrt, pal, 
cms, cmt, masks, mask t, shifts, shiftt) ■;§ W:/'' ' 

Table 13-12gsDPLoadTextureTile_4b(pkt, timg, fmt, width, heigh^fi, ult, Irs, lrt, 
pal, cms, cmt, masks, mas kt,. shifts, shiftt) 



Parameter 



uls 
ult 
Irs 
lrt 



Value 



shifts, shiftt Shifts applied to s/t cc^dma^of each |ikel. This is how you 
"sample" the lower levels" 'ttfi^rnip map. M3' 
G_TX_NOLOD or a number: po^4q@||i » number) ~ s/t to 

sample otheiflrmpmap levels. " : ' mm ''■■-■ ' 



upper left s in|fe^df|he tile within the texture image 
upper left t 
lower right s 
lowe&rieht t 



Color-Indexed Textured 



There are some restrictions; on the size and placement of CI texture maps 
witi||a the TMEM. TKeTmEM is actually partitioned into two halves. Four 
texels are sampled from the first bank and fed into the second bank for 
ilpture^color/ index table lookup (TLUT). 



'Figure 13-7 CI TMEM Partition 

; >;•;■& first half bank 
ill 1 2 3 



second half bank 
12 3 



O^: 



CO 
CI 

Cn 



CO 
CI 
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Cn 



CO 
CI 

Cn 



tO tl t2 t3 



Four texels from the texture images are sent from first half banks to the 
second half banks. The second half banks contain color index palettes. Each 
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color map entry is replicated 4 times for four simultaneous bank lookups. 
Therefore, 8-bit CI textures all require 2 KB (256x : SI bits peggentry) second 
half banks to hold the TLUT, while 4-bit CI texture can haf lip to 16 separate 
TLUTs. 

Note: TLUT must reside on theifecond IpfWT^M; while CI texture 
cannot reside on the second half^fepylpvt. Non-CI texture can actually 
reside on the second half of TMEJvt in paused TJpJT palette /entries. 

Table 1 3-1 3gsLoadTLUT{cbunt, tmemaddr, dfad&'ddr) 



Parameter Value 



count Number of entries m^eTLUT. For example, 4-bit texel TLUT 

would have 16 entries. '* 

tmemaddr . ^ v^wligre the TLUT goes in TMEM. 

dramaddr 11 Where theSLUT is in DRAM. 



Texture-Sampling Mddes 

(Software can enable and disable TX to perform the follow sampling modes: 

• perspective correction 

• detail or sharpen textures 

• LOD (iriipmap) or bilinear textures 

• RGBA or IA TLUT type. 

■||ble 1 3-1 4gsDPSetTexturePersp{mode) 

l^fameter Value 

'mode G_TP_NONE 

G TP PERSP 
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Table 1 3-1 5gsDPSetTextureDetail(mode) 

Parameter Value 

mode G_TD_CLAMP 

G_TD_SHARPEN 
G TD DETAIL || 



Table 1 3-1 6gsDPSetTexhxreLOD (mode) 



Parameter Value 

mode G_TLJTILi 

G TL LOD 



Table 1 3-1 7gsSetTJ|^-reLUT{type) 



Parameter $& Value! 



type G_TT_NONE 

G_TT_RGBM'6 r 

jb. g tt'-iaW 



Synchronization 

With TMEKl^fid tile descriptor states, TX also requires explicit 
synchronizallfn to render primitives with the proper attribute state. Texture 
loads after primitive rendering must be preceded by a gsDPLoadSync{), and 
*HJe descriptor attribute changes should be preceded by a gsDPTileSyncQ. 

Note; If you use the high-level programming macros gsDPLoadTexture* or 
gsBELoadTexture*_4b, then you don't need to worry about load and tile 
J30£s. They are embedded in the macro. 
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Texture filter takes the four texels generated by TX arvKpiftauces a simple 
bilinear-filtered texel. The TF can.alSt^vork together with the color combiner 
(see Chapter 13, "CC: Color Comfmer''||:p perform YUV-to-RGB color space 
c on ve rsion . pf 

Figure 1 3-8 Texture Filter State anlTlplJ" Output flf 
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Texel 0,1,2,3 



hlter rritRaps 



yuv2rgb coetf 



Filtered Texel 

► 



Filter Types 

TF performs three types of filter operations: point sampling, box filter, and 
bilinear interpolation. Point sampling just selects the nearest texel to the 
screen pixeT^In the special case where the screen pixel is always the center of 
four texels, Ihe box filter can be used. In a typical 3D, arbitrarily rotated 
polygon, the bilinear filter is the best choice available. 

? |||te: For hardware cost reduction, the RDP does not implement a true 
bilinear filter. Instead, the three nearest texels are linearly interpolated to 
Jfoduce the result pixels. This has a natural triangulation bias. This artifact 
lis not noticeable in normal texture images, However, in regular pattern 
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images, it can be noticed. For example, notches can be seen in the crosshair 
on a image of grids. This can be eliminated by prcfiltering the image with a 
wider filter. £f:0W ;r 

Tab le 1 3- 1 8gsSetTextureFilter(type) ,|l ■" ; "' ! ■: • ■ V ; 



Parameter Vafue 



type G_TF_POINT 

GJTF_AVERAGE 
G TF BILERP' 



Color Space Conversion 

Color space conversion can be used to convert YUV textures into RGB. This 
could be a u||ful corr|||r^s^ipn technique, or it could be used for MPEG 
video, or foi??||^eciaIJpe'ctl':"'f || 

Table 1 3-1 9gsSetTextureConvert(rnode) 

Par|||gjter Value 

mode % G_TF_CONV 

% G_TF_FILTCONV 

111, G_TF_FILT 

Table 1 3-20gsSftConvert(k0,W,k2,k3,k4,k5) 
lip^rameters Vafue 

kO)lj| k2 G_CV_K0, G.CVJCl, G_CV_K2 

k3, Jj| k5 G_CV_K3, G_CV_K4, G_CV JK5 

' Note: The default state of the RDP is G„TF_CONV (perform YUV2RGB), 
which is probably not what you want (if you axe using RGB textures). A 
common bug is to forget to set this (usually it should be G_TF„FILT). 
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The color combiner (CC) combines texels from TX and- Stepped RGB A pixel 
values from RS. The CC is the ultjfh^tg: paint mixer. It can take two color 
values from many sources andj^earl^^t£.rgplate between them. The CC 

basically performs this equaridlf JlF i? ^ ; *ft: 



nezvcolor - (itB) x£|D 



Here, A, B, C, and D can;;ec^e,.from many different sources. Notice that if 
D=B, then this is a simple lmli|||iterpolator. 

Figure 13-9 Color Combiner State and : Input /Output 



CC 



Stepped Pixel(rgbaf 
fromRS '>■ jiilpi£» 

Texels 



combiner modes ~| 
primitive color j 
environjrierit color| 
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Combined Pixel 

► 



Most of CC^rogramming involves setting the desired sources for (A,B,C,U) 
of the equation above. There are also programmable color registers within 
; CC that can be used to source (A,B,C,D) input of the interpolator. 



jfblor and Alpha Combiner Inputs Sources 

The following picture describes all possible input selection of a general 
purpose linear interpolator for RGB and Alpha color combination.The input 
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in the shaded boxes are CC internal state that j^j^can sf | ? Most:;are 
p r o gT amm ablecolorregisters. 

Figure 1 3-1 RGB Color Combiner Input Selection 
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NOTE: There are two 
Color Combine modes, 
one for each of the two 
possible cycles. 

Common Modes: 

Modulate: 1,8,4,7; T»S 

Decal: X,X,16,1; T 

Blend: 3,5,8,5; (P - E)*TaIpha + E 

Trilinear: 2,1,13,1; (Tl - T0)*LOD + 

TO 

Interference: 1,8,2,7; TO * Tl 

Keying:l,6,6,7; (TO - Center) * Scale + 





Combined Color 



196 



NINTENDO 



DRAFT 



RDP PROGRAMMING 



Figure 13-11 Alpha Combiner Input Selection 




$0TE: ThWB are,two Alpha 
:&.mbrne tyi|plii : S)ri| : ;fpr each of 
Siifjlwo possible cycled 

Common Modes 
Select: X,X.7.fcWiW 
Multiply: 1,7,2^¥6*T1 
Lerp: 1,2,0,2; (TO - Tl)*LODf +T1 



Combined Alpha 



CC internal Color Registers 

Ifiiere are two internal color registers in the CC: primitive and environment 
cfflor. The primitive color can be used to set a constant polygon face color. 
jfhe environment color can be used to represent the ambient color of the 
jfewironment. Both can be used as source for linear interpolation. The names 
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"primitive" and "environment" are purely axbitii^^eiffcan use them for 
any purpose you wish. *~ ^i-- 

Table 1 3-21 gsSetPnmColor(minlevel, fracgffeg, b, a), gsDPSetEnv Colour, g, b, a) 
Parameter Value MM' '&'M. "M°- - 



minlevel minimum LOD )svel'MMMMWM 

frac LOD fraction for blending rwiOvfe 

r, g, b, a color 



One-Cycle Mode § 

Many of the Jypieat^GB and alpha input selections are predefined in. 
Table 13-24,;;|n 1 cycl^rno^e bothe model and mode2 should be the same. 
See the man page for gDPSetCombineMode for a description of each mode 
setting. "^F 

Table 1 3-220ne-Cycle M^de&ing gs DPS erCombineM ode (model, mode2) 

Pffam||er Value 

Inodelljl G_CC__PRIMITIVE 

l|k G_CC_SHADE 
G_CC_ADDRGB 
^G_CC_ADDRGBDECALA 
G„CC_SHADEDECALA 

Spdel /2 Decal textures in RGB, RGB A formats 

1f| G_CC_DECALRGB 

if G CC DECALRGBA 
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Table 1 3-220ne-Cycie Mode Using gsDKetCo||$3meMode|model., mode2) 
Parameter Value o:3 ||| ! 

model/2 Modulate texture in I, IA, RGB, RGBA formats 

G_CC_MODULA|p ;: ' i:>;i -^}| 
G_CC_MODULJpIA M0$%^ 

g_cc_modulaIeidecala 
g_cc_modulatergb; 

G_CCJ^DULATERGBA-il||^^' :: 
GjZCJvlfjj|^ATERGBDECALA 
G_CC_MODULATEI_PRIM 
G_CC_MODULATli|||RIM 

g_cc_modulateidecala_prim 
g_cc_modulatergb_prim 
m^ g_cc_modulatergba_prim 

g^cc_modulatergbdecala_prim 

model/ 2 """" Blend textuj|in I, IA, RGB, RGBA formats. 

G„Gl s ||pfcl 
i|||. G_CC_BLENDIA 

G_CC_BLENDIDECALA 

G_CC_BLENDRGBA 

G_CC„BLENDRGBDECALA 

model /2-ll|; Reflection and specular hilite in RGB, RGBA formats. 
% G_CC_REFLECTRGB 

G_CC_REFLECTRGBDECALA 
| k G_CC_HILITERGB 

G_CC_HILITERGBA 

G_CC_HILITERGBDECALA 

Note: In one-cycle mode, model and mode2 should be the same value. 
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Two-Cycle Mode *' :: -ft&# ;: ' 

Color Combiner (CC) can perform two linear interpolation arithmetic 
computations in two-cycle pipeline mode. Typically, the second cycle is used 
to perform texture and shading color moctilatjon (in other words, all those 
modes you saw in one-cycle mod||i.Howe§er;ffii:;§rst cycle can be used for 
another linear interpolation calculation.; for example, LOD interpolation 
between the two bilinear filtered texels f||m two..rppmap tiles. 

Table 1 3-23Two-Cycle Mod#|femggsDPSetComDineMode(model, mode2) 



Parameter 



Value 



model G_CC_TRILERP 

GJXJNTERFERENCE 

mode2 JpICC;||ASS2 

MMost ol;:tle ; ' : i^a|, Modulate, Blend and Reflection /Hilite texture 
»|rnodesmenhoriei: in one cycle mode. However, since they are 
values for model parameter, the names must all end with 2. e.g. 
G CC MODULATEI2. 



Custom Modes 

Color Combiner (CC) can be programmed more specifically when you 
design youh^dwn color combine modes. To define a new mode use the 
format: 

gfjeflne G_CC_MYNEWMODE ajb&d, A,B,C,D 

Wheie the color output will be (a-b)*c+d and the alpha output will be 
(A-B)*C+D. The values you can use for each of a, b, c, d, A, B, C, and D are: 

"COMBINED combined output from cycle 1 mode 

TEXELO texture map output 

TEXEL1 texture map output from tile+1 

PRIMITIVE PrimColor 

SHADE Shade color 

ENVIRONMENT Environment color 

CENTER chroma key center value 

SCALE chroma key scale value 
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COMBINED_ALPHA combined alpha output ' :? -%$zgi${:--0y<. 

TEXELO_ALPHA texture map alpha 

TEXEL1_ALPHA texture map alpha from tile+H 

PRIMITIVE_ALPHAPrimColor Algtea;;, 

SHADE_ALPHA Shade alpha?: 1 :--'' 

ENV_AL PHA Environmeri p c o 1 o r :;:;; ^:||3h% ; 

LOD_FRACTION LOD fractlj^ 

PRIM_LOD_FRAC Prim LOD frae't^ri 

NOISE noise (random) 

K4 color convert cong : t^nS-- :: - i K4 

K5 cololpgonvert constant k5 

1 1.0 Ipllfe;., 

0.0 W "^§M ., 



Then you can-use, your new mode just like a regular mode: 
gDPSetcJmbinel^^ifcCC.MYNEWMODE^.CC^lYNEWMODE); 



Chroma Key 

■Theicolor combiner can be used to perform "chroma keying", which is a 
process where areas of a certain color are taken out and replaced with a 
texture. This is a similar effect to "blue screen photography", or as seen on 
the television news weather maps. 

The theory is quite simple; a key color is provided, and all pixels of this color 
are replaced by the texel color requested. The key color is actually specified 
|gs a center and width, allowing soft-edge chroma keying (for blended 
: filors): 

jpure 13-1 2 Chroma Key Equations 

KeyR = clamp(0, (-abs{(R - RCen) * RScl) + RWd) , 255) 

KeyG = clamp(0, (-abs((G - GCen) * GScl) + GWd) , 255) 

KeyB = clamp (0, (-abs((B - BCen) * BScl) + BWd) , 255) 
KeyA - min(KeyR, KeyG, KeyB) 

The center, scale, and width parameters have the following meanings: 
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Center 

Scale 

Width 



Defines the color intensity"| : iwhich:: L the key : is active, 

0-255. ' "^«llt»* 

(255/ (size of soft edge)). For hard edge keying, set scale 
to 255. J#-% 

(Size of half t§ie key v$h:db^fjncluding the soft 
edge)*scale. f£width;p"255, tftpn keying is disabled for 
that channel. " #: '" : "" : M 



In two-cycle mode, the keMog operation tfillSf We specified in the second 
cycle (key alpha is not avajf§l|§|as a combine operand). The combine mode 
G_CC_CHROMA_KEY2 is.'defmed .for this purpose. 

The command 

gsDPSetQgpHl^ey(G_CK--KEY) ; 

enables chriiia key^p^^^. 

The commands 

.ifjsDPS e tKeyR ( cR ? 4g$F' wR ) ; 
4 1; §'S-DP S e cKeyGB {cG, sG, wG , cB , sB, wB) ; 

fallow yf>u to set the parameters for each channel. 
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The BL takes the combined pixels and blends them against the framebuffer 
pixels. Transparency is accomplished^ blending against the framebuffer 
color pixels. Polygon edge antiaijasmg^^erforrned, in part, by the BL using 
conditional color blending basifl on degl^n^ge. The BL can also perform 
fog operations in two-cycle mol^v^-^f ;f | 

Figure 1 3-1 3 Blender State and Input/ Oufjkit 
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Combined Pixel 
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"Siifface Types 

The BL:can perform different conditional color-blending and z-buffer 
updatiiigljherefore, it can handle semantically different surface and line 
types. Figure 13-14 illustrates these types. 

Figure 13-14Surface Types 
S;:: : decal opaque 



interpenetrating 
surface 




transparent 
surface 
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Antialiasing Modes 

The most important feature of the BL is its participation m ^antialiasing. 
Basically, the BL conditionally blends- or- writes pixels mto the framebuffer 
based on depth range. Then the vjjito dis£ ; f|yiogic applies a spatial filter to 
account for surrounding backgrotftd cologptb' 'produce antialiased 
silhouette edges. 

The antialiasing scheme properly an hanasg| ; rn^ -pixels; only a small set of 
corner cases have errors aficLare negligible. Tms algorithm requires ordered 
rendering sorted by surface or line types. Here is the rendering order and 
surface/line types for z-bttrrer antialiasing mode: 

• All opaque surfaces are rendetilf. 

• All opaqufedecal surfaces are rendered. 

• All.ipaque interpenetrating surfaces are rendered. 

• All ©l : ;the translucen|:$urface and lmes are rendered last. These can 
be rendered in any order. However, the proper depth order gives 
proper transparency 1 

Note: There is an additional optimization discussed later; if z-buffered 
.Mf surfaces in the scene are rendered m approximately front-to-backorder, 
th&jfill rate is improved because the z-buffer test is a read only (no write) 
for obscured pixels. 

Besides the aftiialiased z-buffer rendering mode, the other three 
combinationslalso exist: antialiased /not z-buffered, z-buffered /not 

antialiased, not z-buffer/not antialiased. 

Table 1 3-240ne-Cycle Mode gsDPSetRenderMode(model, mode2) 
Paifpeter Value 

I model G_RMJFOG_SHADE_A 

G_RM_FOG_PRIM_A 
G_RM_PASS 

or one of the primitive rendering modes. 
e.g. G_RM_AA_ZB_OPA_SURP 

mode2 e.g. G_RM_AA_ZB_OPA_SURP2 
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Note; Even if you are only in one-cycle mode, mode-2 should be 
programmed. Mode2 value is always model Impended with "2" . 



Table 13-25Two-Cycle Mode gsDPSetReiiderMode (model, mode2) 



Parameter Value 



model G_RM_FOG_SH ffiffegfls^f : ' 

G_RM_FOG_PRIM_A ||. jf 

G„RM_PASS '• ; %||f r 

mode2 same as ariii!4%ele mode mode2 values 



Note: When setting the cycle type to GjC YC_FILL or G_CYC_COPY, make 
sure to use the command g*DPSetRenderMode(G_RM_NOOP, 
G_RMJVQ@^%::tp guarantee that the blender is m a safe state. 



BL Internal Color Registers 

HJL. has two internal -cMor registers, fog and blend color. These values are 
liptogTammable and can be used for geometry with fog or constant 

%ahsparency. 

Table 1 3-26gsDPSetFogColor(r, g, b, a) gsDPSetBlendColorfc g, b, a) 
Parameter Value 

r, g, b, a color 



Alpha Compare 

pL can compare the incoming pixel alpha with a programmable alpha source 
to conditionally update the framebuffer. This has traditionally allowed nice 
tree-outlined billboards and other complex, outlined, billboard objects. 
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Besides thresholding against a value, the BL ca^^o c^iafLpareisgainst a 
dithered value to give randomized particle effect?—"^^ 

Table 1 3-27gsDPSetAlphaCompare(mode|^^ 
Parameter Value 

mode G_AC_NONE W?: i: ^ff f 1|| 

G_AC_THRESHOLD '''^1| M 

G„AC_DITHER ' ! ^0:00r 

Note: When usmg mode £p^|||£RE5HOLD, alpha is thresholded against 
blend color alpha. "''^ll^. 

Note: Another way to do billboard cutouts which often provides better 
antialiasing ispluSf .Alpha Compare off (G_AC_NONE) and instead use 
one of the TjIPLeDGE render modes, such as G_RM_AA_ZB_TEX_EDGE. 



Using Fog A 

Thfifender performs the fog operation. Fog is described hilly in "Vertex Fog 
jptate'^dnpage 169. Fog is performed by the RSP and the RDP in cooperation, 
ffiie RSfeakes the z value and places it in the alpha channel of each pixel. 
The RDP^fijen uses this alpha channel to blend the color from the color 
combiner "l|i|h the fog color. The larger the Z value (the farther the pixel is 
from the viewers eye) the closerthe pixel's color gets to the fog color. The RSP 
part of this operation is enabled with the gSPSetGeometryMode: 

||;|.: ; , gsSPSetGeometryMode(G__FOG) , 

anlpan be adjusted with gsSPFogPosition: 

yp; : gsSPFogPosit:ion(FOG_MIN, FOG_MAX) , 

The RDP part of fogging is enabled by telling the blender how to use Alpha. 
Fog can be used in one cycle mode for non-antialiased opaque surfaces only: 

/* lcycle mode */ 

gsDPSetCycleType{G_CYC_lCYCLE) , 

/* blend fog in ZB mode (non-AA OPA_SURF modes only) */ 

gsDPSetRenderMode (G_RM_FOG_SHADE_A, G_RM_ZB_OPA_SURF2 )• , 
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/ * set the fog color * / ,;: ; S '& " 

gsDPSetFogColor( RED , GREEN , BLUE , AL*f 1a|= ;'' 
/* setup the RSP */ 4Qfi:Xy~ 

gsSPFogPosition (FOG_MIN , ,|0G_MAX) 
gsSPSetGeomet ryMo de ( G^EQCJ ¥-y< ^ ; ;% 

It can be used for other surfac|%pes.|pf ; wiMl?atialiasing) in 2 cycle mode; 

/* 2 cycle mode */ ||| 

gsDPSetCycl eTyp e ( G_C YC„2 CYciM^m0: 

/* blend fog. lifl^any standard render mode for cycle 2 */ 

gsDPSetRenderMoc^l%RJl_F0G_SHADE_A,G_RM_AA_ZB_0PA_SURF2 ) , 

/* set the fog ;|dlofw^v. . 

gsDPSetFogColor (RED, GREEN .BLUE, ALPHA) , 

/* setup the RSP */ 

gs S PFog.P,gs. i t i on ( FOG_MIN , FOG_MAX ) 

gsSPSj : ps|jnetryMode(G_FOG) , 

As an allj|nahVe to G J^i J03„SHADE_A (for the first cycle of 
gsDPSetRehderSf ode) yc^can use G_RM_FOG_PRIM_A which will use the 
alpha value in PrimColoJtb set the fog value. If you use this mode, then the 
gSP's part of fog is.:ud^et:cessary and the gsSPFogPosition and 
:i:gs;SPSetGeom.etryMode macros arc not neccessary. Instead set the fog value 
per primitive with the gsDPSetPrimColor macro: 

g.sDPSetPrimColor (0,0,0,0,0, FOG_VALUE) , 

where tn%pOG_VALUE is for no fog and Oxff for full-fog. 

Note that objects with FOG can still be transparent. The alpha value used to 
modulate fog comes from the triangle renderer. The alpha value that comes 
Ifrom the color combiner is independant of that renderer fog alpha. For 
l|cample the color combiner can be set to use the alpha value from a texture 
Jfiap, and fog will still work with the alpha value from the renderer. You 
Fcannot, however, use vertex alpha with fog. The per alpha supplied in the 
vertices will be ignored and if the color combiner selects a SHADE alpha, it 
will get the fog alpha value instead (not what was intended). 
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Depth Source 

The depth value used in the depth buffer compare is geil||§|liy taken from 
the Z value of the pixel, determined by interpolating the z values at the 3 
vertices of the triangle containing the pixel. However it is sometimes 
desire able to set the Z value whicli^vill bejipd:io|/.an entire primitive. This 
is actually neccessary when rende1|h^^^uffered>||ctangles (gDPFillRect 
and gSPTextureRect) since these primitives do notjl&ve a Z value associated 
with them. To use a single Z value for an^i^e;|prnitive the Z value is 
placed in the PrimDepth ||l|ster and the Z sBurce Select is set to get Z from 
th e Pr imD epthregister: 

gsDPSetDepthSource (G_ZS_PRIM) i , ; ;: k. 
gsDPSetPrimDepth.(z, dz) , 

The value to jase : fe J&is the screen Z position of the object you are rendering. 
This is a vajp rangi|| ;: |r|)m ;; 0x0000 to 0x7fff, where 0x0000 usually 

corresponds^, the n|ar eliding plane and 0x7fff usually corresponds to the 
far clipping plane. To synchronize Z for PrimDepth with a Z for a triangle it 
is important to understand; jow the triangle's Z gets computed. The 
moj|gling coordinate"^r|ex% multiplied by the modelview and projection 
mSiiiCes resulting in a 4 componant homogeneous coordinate (x^z^w). The 
; ||reerf £ value is computed by the RSP as 

screenZi;= 32* ( (z/w) *viewport .vscale[2] + Viewport .vtrans [2] ) 

Note: Viewport. vscale and Viewport. vtrans [2] are usually both G_MAXZ/2 
= Oxlff, whiclf makes the formula: screenZ=(z/w)*0x3fe0 + 0x3fe0. Since 
(z/w) ranges from -1.0 to +1.0 the result will range from 0x0 to 0x7fc0. 

Note: For microcode progrmmers: The 32* part of this equation is done in 

thelletup microcode. The other parts of this equation are done in the vertex 
processing microcode. 

So if you want to position a rectangle at a specific modeling coordinate 
position, run the modeling ccordinate of the position through the 
modelview and projection matrix, and then comput its screenZ value based 
upon the formula above. This is the value to use for z in the 
gsDPSetPrimDepth command. 
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The dz value should be set to 0. This value is used for antialiasing and objects 
drawn in decal render mode and must always be a power of; 2 (0, 1, 2, 4, 8,- ... 
0x4000). If you are using decal mode and part of the derailed object is not 
being rendered correctly, try setj^gSllas to powers of 2. Otherwise use 0. 
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Ml: Memory Interface 



Memory Interface (MI) simply interfaces to the frarnebuffer%iemory. It has 
programmable color and z-buffer pq^er3|a 32-bit fill color value used in 
the FILL cycle type (see Chapter l|jff Fill f%»de^,and an enable for color 
dither. fjfL """" ''' : $k. 

Figure 13-15 Memory Interface State andTnp4t/Outpuf^ 
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till color 
color image ptr 



mask image ptr 



Pixels to framebuffer 
► 



framebuffer Pixel 



Image Location and Format 

: The framebuffer is row-ordered, starting at the upper left. The color and 
z-buffer image pointers must be 64-byte aligned. The DRAM has dual banks, 
one on each ftjfjB. By keeping the color and z-buffers on different banks, you 
can improve the DRAM access latency when the RDP is seeking DRAM 

^bandwidth for rendering. 

The Nintendo 64 system actually uses 9-bit DRAMs rather than 8-bit 
DR^||ls, to gain two extra bits per color or z pixel. The color and z format 
arelflustrated in Figure 13-16. 

Ti'gure 13-1 6 Color and Z Image Pixel Format 
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Fill Color 

The MI has a 32-bit fill color register that is used in FILL^'cycle type. Fill color 
is typically programmed to a cor)Sja|sJ;-yaIue to fill background color and 
z-buffers. Since two frainebufYer):pixels : ;i ; |re 18x2=36 bits, while fill color 
register is 32 bits, a few of the fifts are r||Seafed. See Figure 13-17 for an 
illustration of how it works. ^"^ 

Figure 13- 17 Fill Color Register LSB Replication : /-i 
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Table 1 3-2f§pSetFiilColor{dEafa32bits) NEED READABLE TITLE FOR THIS! 
Pa ra m ete r Va lu e 



:dMa32bits 



2 different macros, one each for color and z. each generate 16 bits. 
so do x « 16 I x to get 32 bits 

GPACK_RGBA5551(r, g, b, a), a=l is full coverage. (Typical) 

GPACK_ZDZ(z, dz), z=G_MAXFBZ, dz=0. (Typical) 



Dithering 

||he RDP pipeline keeps full, 8-bit per RGB component precision 
throughout. Dithering can be enabled or disabled to write to the 5-bit per 
JfGB component dram framebuffer format. Dithering is recommended since 
III can significantly reduce Mach banding effect. 
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Chapter 14 

Texture Mapping 



Texture mapping, or texturing, is the process of applying an image to a 
polygonal surface- There are many graphics books that discuss this topic; 
this guidepsum|s that you are familiar with the basic principles of texture 
mappmg,jThis cft^t#f|»xplarns the functionality of texture mapping as 
implemented m lie Reality, Display Processor (RDP). 
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Figure 1 4-1 Texture Unit Block Diagram 
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The RDP contains an on-chip texture memory called Tmem, which buffers 
all source image data used for texturing. Tmem contains up to eight tiles (a 
tile is a rectangular region of an image). A tile is loaded into Tmem using the 

LoadTile, LoadBlock, or LoadTlut commands, and described using the SetTile 
and SetTileSize commands. If the image is too large to fit entirely in Tmem, 
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primitives must be subdivided in object space ifeas^^h theii; texture 
coordinate values so that each primitive references a tile thjf fits in Tmem. 

Texture coordinates (S,T) for each pixel are input to the texture coordinate 
unit and can be perspective corrected. Perspective correction is typically 
enabled for 3D geometry and enabled for '2D sprites {tex_rect commands). 
During this time, the texture coordinate unit cal|f|lates which tile descriptor 
to use for this primitive. The texture image coordinates are converted to 
tile-relative coord mates and wrapped/#^rc!fi& and clamped. These tile 
coordinates are then use^o generate an offset into Tmem. The texture unit 
can address 2x2 regions ,^t§x : els in one or two cycle mode, or 4x1 regions in 
copy mode. Copy model! typically used for blits (block copy of texels) with 
a 1:1 texel pixel relationship. In one or two cycle mode, filter or point-sample 
can also be selected. Typically, filter will result in a smoother image with less 
aliasing. The.: -texture unit also generates S,T and L- fraction values that are 
used to bi-^earlv^pr tri-hnearly interpolate the texels. 

The textulfjgnit Supports ;ten different combinations of texel size and 
ormat: 

. 4-bit intensity (Ijof^ip 1 

f|-bit intensity w/alpha (I/A) (3/1) 

WmX color index (CI) 

8-§|| 

8-bitl|g4/4) 

8-bit CI 

16-bit red, green, blue, alpha (RGBA) (5/5/5/1) 

16-bit IA (8/8) 

16-bit YUV (Luminance, Blue-Y, Red-Y) 

32-bit RGBA (8/8/8/8) 

Significant memory savings can result from the smaller color-index textures 
or intensity textures over the more expensive 16-bit RGBA. It is a good idea 
to experiment with the different texel sizes. One can actually do 2-color 
textures using the intensity types. Also, the intensity-only textures place the 
texel value on the alpha channel as well where it can be used for blending or 
ignored. 
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Graphics Binary Interface for Texture 



The graphics binary interface (GBI) is a set of macros thaftlriSfe 64-bit 
commands that are read and parsed ; b:ftfee„ RSP microcode. Some of these 
commands cause actions or state ctjahges ^ the RSP. Others are simply 
passed through the RSP to the RIpl Belov^tp^St'Of GBI commands that 
control texture. See the correspondixfei^ilrence (m'lti) page for more details. 



Primitive Commands 

• g*SP Texture 

• g*SPTextureRectangle* 

Tile Relate^ Cornmands 

• g*DPSe|j|e 

• g*DPSetTiieSize 

Load Commands 

W g*E>PLoadTile* 

• g*DPl;^adTextureBlock* 

• g*DPLol|^LUT* 

• gDPSetTextuxelmage 

Sync Commands 

• g*DPLoadSync 
*• g*DPTileSync 

Mode Commands 

• g + DPSetTextureLUT 

• g*DPSetTexturePersp 
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g*DPSetTextureDetail 
g*DPSetTextureLOD 
g*DPSetTextureFilter 
g*DPSe tTextureConvert 
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Example Display List 



The following display list fragment uses GET display list'liiiimands to 
render an object using a 16-bit RGB A texture map. The texture is loaded into 
Tmem using the LoadBlock command. The texture coordinates are 
perspective corrected. Note that the texture is allowed to wrap on 32-texel 
boundaries in the s and t directions. The texture filter bilinearly interpolates 
the 2x2 texels output by the texture unit. Finally, the resulting texture coior 
is multiplied with the object's shade color ;|ri.:§ig::;fiblor Combiner for each 
pixel of the object 



/* Enable textured poly generl?i|||n. in RSP */ 

gSPTexture(glistp++, 0x8000, Oxtlfo, G_TX_RENDERTILE, G_ON) ; 
gDPSecTextureEiiter (glistp++, G_TF_BILERP) ; 
gDPSetText^|Pe : £^(gliStp++, G_TP„PERSP) ; 
gDPSe tCon^&Modet|;gl::feS;tp++ / 
G_CC^ODUl|&ERGB r :|^6&^iG)|ULATERGB ) ; 
/* Load Texture Block * |f ; i> 

gDPLoadTextureBlock(glis.ip++, RGBA16dana, G__IM_FMT_RGBA , 
G_IM_SIZ_16b, 32, :<:3:2:, "j0~; G_TX_WRAP, G„TX_WRAP , 5, 5, 
G_jj|||JOLOD, G_TX_NOLOD) ; 
/;f;fere||der model display list */ 
;::g^PDi%.layLxst(glistp++, model) ; 
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Texture Image Space 



Texture coordinates are defined for textured primitives In Texture Image 
Space. This space has a range of -r/ - IK texel. Tiles axe smaller rectangular 
regions of a texture that fit into the on-chip texture memory of the RCP 
(Tmem). ., ; -%P' 

Figure 14-2 Image Space and Tile Spac|||. 



-1024, -1024 
r — ■ 



|Texture Image Coordinate Space 




-> 



v 
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Tile Space 



Primitive 



SH.TH 



j 



1023.99, 1023.99 



Tiles are defined in Texture Image Space using SL, TL and SH, TH 
coordinates, as shown in Figure 14-2. Tile coordinates must lie in the positive 
S,T quadrant of Texture Image Space. However texture coordinates of the 
primitive can lie in any of the four quadrants of image space. In other words, 
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primitives can have negative texture coordinaft^which-^&n be%seful when 
wrapping a texture on a very large primitive. Tiles can be up to 1024 columns 
wide and up to 256 rows tall. Tiles do not have to be sized to a power of 2 
(wrapping and mirroring, however, happen on power-of-2 boundaries). 

The texture coordinates of the primitive (in Texture Image Space) are 
converted into Tile Space by subtMCling|pe v; S't,t^;fxorn the (possibly 
perspective-corrected) texture coordinates of the pixel. This indirection 
allows arbitrary placement of the tile wi||respecflo the primitive. This 
implies that the texture coordinates can be : (||llh^i once in the database; and 
that the texture can be tra§l^|ted (or slid) with respect to the primitive by 
simply manipulating the SL,TL values using the SetTileSize RDP command. 
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Tile Attributes 



The RDP has a small on-chip memory for buffering up to eight tile 
descriptors at a time. A tile descrip^contains all the information for a 
texture tile including format; si2e; line; Tmem address; palette; mirror enable 
S, T; mask S, T; shift S, T; SL, TjjgH, TH; and clamp S, T. 



Format t . 




Format of texels in texture tile. 




Table 14-1 Tile Format Encodings "** 




Format Value 


Format 


o jjf ''ILhiii^ 


RGBA 


1 : 1|, W '' ; 'f| 


YUV 


2 ^M" 


CI 


•. - mm»" 


IA 




I 



Size 

Size of texels in texture tile. 
liable 14-2 



Size Value 



Size of texel in bits 



16 
32 
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Line ^%;##'' '% 

Number of 64-bit words in one row of the tile. Dependenfiltffie row width 
as well as texel type/ size. When tiles:4^ loaded using the LoadTile 
command, the rows are padded tcg|f£-bit hs^djries. When LoadBlockis 
used to load a texture, it is assume|lfhat th^pw^fiaye already been padded. 
Line can also be used to control me : :i|ticli;jfcougliB|v / IEM. By controlling 
Line, smaller tiles can be pieced togetherflhto onejllger continuous tile. 



Tmem Address 

Tile offset (0-511) in Tmem (64-bit) WcSip^ 

Palette jf J " "'^|^^ 

Palette numbft^O-lSf-of 4~bif|(tolor Index (CI) textures. An 8-bit index into 
the high half of Tmem is formffi by placing the palette number in the 4 MSBs 
and the 4-bit texel value in^pe 4 LSBs. The color in Tmem at this index 
becgMes the color of the' pixel. Therefore, for a 4-bit CI texture, you may 
se||ct one of 16 palettes with each palette having up to 16 entries. Palettes 

0ai be l|aded into Tmem using the LoadTLUT command or, optionally, the 

fLofldB/ocfccornmand. 

Mirror Enafll S,T 

^Enables mirroring of texture coordinates. When the bit indicated by the 
(M'isk Value + 1) is the coordinates are unchanged. When this bit is 1, 
hoviiyer, the coordinates are inverted. Useful for symmetric patterns like 
trees, faces, etc. For example, a mask of 2 with mirror enabled would yield 
thetfollowing texture coordinates: 

0,1,2,3,4,5,6,7,... Input coordinate 
0,1,2,3,3,2,1,0,... Mirrored Coordinate 



222 



NINTENDO DRAFT TEXTURE MAPPING 



MaskSJ 

Number of bits of tile coordinate to let through. For example, a mask of 1 
indicates one bit of the texture CQGj|iH|ate should come through the mask, 
giving a pattern of 0,1,0,1— As arpmefeftample, a mask value of 5 indicates 
that the texture should wrap eifry 32 texfels^i; l^ ; the lower 5 bits are passed 
through the mask. A mask valu|;:pj,0 ; ;l^rces clamping the texture 
coordinates to be between (SL,TL),{Sr||TH) inclusive. The mask value + 1 
mdicates the bit position that is lookecl^tfo.rj;rMrroring. See discussion in 
Mirror Enable, above. \:>ik-,, a m-QW----- 



Shift S,T '■'"It 

Shift texture.cpprdiriates after perspective divide. Used in MIP maps and 
possibly fofjprecision reasons (see the discussion of Detail texture later in 
this doculr|ent).Al$p;fi^iuJ. for combining two differently scaled textures. 

Table 14-3 graft Encoding /§' 

S||ft Value Shift 

;|F \l|. no shift 

1 ! i|. » 1 

2 »2 

3 » 3 

4 »4 

Ik » 5 

6 1| » 6 

§r »7 

8 »8 

9 »9 

10 » 10 

11 «5 
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Table 1 4-3 Shift Encoding 



Shift Value Shift 

12 «4 

13 ff « 3" 

14 '% '«'2-; 

15 «1 



SLJL 

When rendering, the starting texel column, row of tile in texture image 
space, 10.2 fixeo^ipjnt Can be used to slide texture w.r.t. the primitive. 
When loadrng^tSe'starting texel column, row within the DRAM texture 
image, 1M M^M&, 



SH,TH , fc ,# :/ 

Wfjeiilendering, the ending texel column, row of tile m texture image space, 
M01 nxfld point. Used for clampmg only. When loadmg, the ending texel 
icoluirtn, row within the DRAM texture image. 



Clamp SJ 1 ! 

^Enable clamp during wrap or mirror. When not masking, Clamp S,T is 

:; ipi<5red and clamping is implicitly enabled. This bit allows clamping the 
texMlre coordinates when the mask is non-zero. Useful when you want to 
mirier and then clamp like an airplane wing insignia. The border of the 

.insignia would have an alpha of 0. For example, SH = 11, mask - 2, mirror = 

iffclamp = 1: 

0, 1,2, 3, 4, 5, 6, 7, 8, 9; 10, 11, 12, 13, 14, 15, .. . Input Coordinate 
0,1,2,3,3,2,1,0,0,1, 2, 3, 3, 3, 3, 3,... Mirrored/ Clamped 

Coordinates 
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Tile Descriptor Loading 



Tile descriptors must be loaded using the RDP command. SetTile. This 
command loads the format, size, litw;^£rne?n address, palette, clamp, mirror, mask, 
and shift parameters for the til&ituinBefjspecified. The SL, TL, SH, and TH 
parameters are set by the RDP^Ipirunar^^S^effi^Szze, LoadTile, LoadBlock, 
and LoadTLUT. "' v §| 

One important point to keep in mind ' : -is:^at,ii|e i: descriptors are used both 
when loading texturesiand when rendering textures. In particular, when 
loadmg a texture, the texture coordmate unit uses the Tmem address, line, 
format, and size mformanoh'llom the tile specified in the 
LoadTile/Block/TLUT command. Therefore, this information must be loaded 
into the tile descriptor prior to executing the LoadTile/Block/TLUT command. 
Also, the Logffile/Block/TLUT command automatically writes the 
SL,TL,SH /: Ip : inlirmation into the tile descriptor In the case of a LoadTile 
comman||;this is.g|a|>a.b.Iy the information you wanted. In the case of a 
LoadBloc%.^ :i Loac0EUTdi$^}jnaiid f however, this information must be 
overwritten 1 ' with a SeiTile$ize command after the texture load. 

T|a.e GBI commands forloadrng tile descriptors directly are: 
; • |*DPSetTile 

• 'IfpPSetTileSize 

The GBI commands that effect tile descriptors are: 

• g*DPfiiad'Ii3e* 

• g*DPLoadTextureBlock* 
|k g*DPLoadTLUT* 

Jpte: The load commands above use a double buffered tile system for 
pading/ rendering. When loading, the tile G_TX_LOADTILE is used, and 
when rendering the tile G_TX_RENDERTILE is used. This simple scheme 
avoids having to insert TileSyncs between loading and rendering. However, 
if you need to use more than one tile for some reason, make sure that you use 
the g*DPSetTile and g*DPSetTileSize to set the tile descriptors properly. 
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Texture Pipeline 



Figure 14-3 Texture Pipeline 
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Figure 14-4 Texture Pip eline, con td . 
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Tile Selection 



Functionality 

Tile descriptors are used both whl||loadin|f a' texl^e and when rendering a 
texture. This section discusses the seftctfah of tiles ;^hen rendering. The use 
of tiles descriptors when loading texture^in disclosed in the Loading 
Textures section. M ^&|p°gg^" 

There are basically two wap :; tcilndex into hie memory: explicitly via a 
user-defined hie number/ Br indire My. using a combination of the 
user-defined tile number and the level|d| detail (LOD) of the pixel. 

In two-cycle ir^d||it is possible to access different tile descriptors in each 
cycle. The cgpfmtation of tile indices for each cycle depends on several 
mode bits aril is described in the following sections. 



LOD Disabled ,■ 

5$fihl|pD disabled, the user specifies the texture tile for a primitive directly 
§§ling the gSPTextitre command. This hie number is inserted by microcode 
'^into thef||ader for each subsequent primitive and is referred to as the 

primitive W&number. 2 -cycle non-LOD mode can be useful for combining two 

arbitrary textures (morphing, etc.) The calculation of the tile descriptor 

index is straiffit forward when LOD is disabled: 

liable 14-4 Tile Descriptor Index Generation with LOD Disabled 

Cylll. Tile lnc *ex 

,im primitive tile 

T" primitive tile + 1 
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LODEnabled 

The lod_en mode bit in SetOtherModes determines if tit^plices are 
determined using Level of DetaU.|L©P) or from the primitive command 

directly. J| v ' 

With LOD enabled, the tile mdix^. ; a ; ;|3iction tpfhe Level of Detail (LOD) 
of the primitive. LOD is computed :: a s# : functions f the difference between 
perspective corrected texture coordma^s,p£ : 3$pcent pixels to indicate the 
magnification /minMcafcon of the texture -S : screen space (texel/ pixel ratio). 
The LOD module also ca||i|la.tes an LOD fraction for third axis interpolation 
between MIP maps. TT#coms|nation of LOD-derived tile coordinates and 
fraction, a particular tile descripfSiilicrangement, and tri-linear filtering 
allows the implementation of MIP maps. Notice that MIP mapping is a 
specialized use. of the general texture hardware. Other types of mappings are 
possible. The LOD calculation makes the following features (and maybe 
more) p o||ib le: M?&0£;X: x 

• trUme^ivOPlSiappingS: 

• sharpened texture . ( .|S V 

• detail texture 

.#;#' The'HOD calculation depends on the following inputs: 

"* ? • LdE>: level of detail@pixel (texels/ pixel), derived per pixel 

« minl|feyel (0.5): minimum LOD fraction clamp for sharpen or detail 
^ modes> from the SetPrimCohr RDP command 

; <|| |:: • max_level (0-7): number of MIP maps minus one, from the primitive 

via the gSPTexture command. 

• detail_en: enable for detailed texture, from SetOtherModes RDP 
Jf command 

<?&.. l,::?^ sharp_en: enable sharpen mode, from SetOtherModes RDP command 

• prim_tile (0-7): primitive tile number, from the primitive via the 

IF '' : '"?|fr. ; ,. gSPTexture command. 

• lod_en: enable for LOD calculation, from SetOtherModes RDP 
command 

IP " V ' n lll»:>;,. T^ e LOD calculation produces the following outputs: 
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• l_frac (s.,0.8): LOD fraction for 3rd axis mte^ojajioftf 

• l_tile (0-7): tile descriptor index into tile memory m , :/< §W 

The LOD per pixel is clamped to n0 : &eWl,The LOD tile index is then 
calculated using the equation: 

Ijile = log2((int)lod_clamp) 

So, for example, an LOD of 7.5 would be%f>Milt£d to an Ijile of 2. This 
index is clamped to max J^M,^nd then addecL to the prim_tile. For example, 
the tile arrangement for a MIP map with a pnmjtile = 2 and maxjevel = 3 
would be arranged as shdwn in Tajjie ,14-5. 

Table 1 4-5 Example of Tile Address and LOT) Index Relationship 
Tile Address 4§^' :f *%$$ LOD Index 

i "- j| 

2 .^ ^lill^ 

3 1 

ir 2 

5 ^|: L ., 3 

6 '^# 



Tn§i_frac is derived by dividing the clamped LOD by 2 - . For example, 

an LOD of 7.5 would yield an IJrac of 0.875. The IJrac is modified 
Offending on the mode bits detail__en and sharp _en. Note that the detail and 
Ifiarpen modes discussed below are exclusive. If enabled simultaneously, 

special effects may result. If neither detailjzn or sharp_en is true, then the 

IJrac is passed to the color combiner unmolested. 

Sharpen and detail mode change the behavior of the tile index calculation 
when magnifying. The texture is magnified when you get so close to the 
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primitive that one texel is being applied to m^^p&eis, even using the 
highest resolution texture in the MIP map. 

Table 14-6 Generation of Tile DescnjJtWfedex With LOD Enabled and Magnifying 

Cycle Detail Sharpen IDetail & 

""'>§>, .'Sharpen 



pnm_tile + l_tile pnmVtile + l_ti}e; prim_tile + Ltile 

1 prim^ie + l_tile prim_til^:# : l|;tile pnm_tile + l_hie 
+ 1 ilfta. + 1 



Table 14-7 Generation of Tile Descriptor Index With LOD Enabled and Not 
Magnifying :: "W 

Cycle DetaiS Sharpen !Detai! & 

! Sharpen 

^fim_tile?6l_hle prirnjile + l_tile prim_tile + I_tile 

"W + 1 

1 prim^tfpH- l_tile pnm_tile + l_tile prim_tile + l_tile 

A. +2" v:v;: "' r +1 +1 



Also nb%that \Jile is clamped to maxjevel when at the coarsest level of 
detail. '''$£>■ 
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MiPMapping 

An example of the tile arrangement for a MTP map is sheflilS Figure 14-5. 
Figure 14-5 MIP Map Tile Descriptor&iP^^:;^, 

MIP Map pyramid, no detail map Wk-. A ?:lf ''" 

''■ l m^ : g || Pnm_Tile = 2 

□ '" D . Max_level = 4 

i 4 i: #p]:;||'p 15 Lod_en = 1 

I '' *!% >. 4 2 . Sharp_en = or 1 

i — — i 3 i Detail_en = 

i '' "^:-.--r ; 2 o 

■^f 1 

To implement trilmep'rvllP^apping, the RDP must be in two-cycle mode. 
A tile is referljiced in each ojf|he cycles and linearly interpolated using the 
IJrac in the color combiner, i::;-' 

Fo1r|r&re control of interpolation between two texture tiles a register 
:g$hi Jlac (0.8) is provided that can be used as an input to the color combiner. 
i;prirn_fr ; ac is set by the SetPrimColor command. 

Care shoilfcbe taken in the off-line generation of the MIP maps. Depending 
on the filter u|ed for generating the levels, the different levels can end up 
unaligned if rfbt careful. For example, if using a simple box filter for 
generating the coarser levels, an offset of 0.5 should be added to the SL and 
flsL of each level to insure that they align when laid on top of one another, 
wither these or other offsets are necessary depends on the filter used. 
Typically higher order filters will result in higher quality MIP maps. 

ifpfbther word of caution. In computer graphics, extremely high frequency 
textures are a bad thing. Going from black to white in one texel being the 
highest frequency. High frequency maps are more likely to alias (flicker) 
when edge on or far away. So when generating map data use common sense 
and possibly lower frequency texture data to avoid these problems. 
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Magnification 

Figure 1 4-6 Magnification Interval Relative to LOD 
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Detail Texture 

Even with trilinear MIF mapping, textures can look blurry under 
magnification (that is, when 0.0 < LOD <= 1.0). One way of avoiding this is 
to use very large textures that contain high-frequency detail. But this would 
|be expensive in Tmem. 

Detail mode comes into play in magnification. The finest level of the base 
ifpxture is combined with a (usually small) detail texture in such a way as to 
r repeat the detail-texture over the base texture several times. A base-texel 
would, upon magnification, appear to contain four or more detail texels 
blended with the base-texel color, thus providing high-frequency 
information without having to sacrifice large amounts of Tmem. This can be 
used very effectively; for example, to provide motion cues when close to the 
terrain. 
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Detail texture mode is most effective in situati61^^te|; : the high-frequency 
information and overall hue are relatively consistef^mr'oughopt the texture. 
To convert a high- resolution image into a low-resolutionl^i^-e (for the base 
texture) and a detail texture, follow this procedure: 

12. Make the low-res image by fijjpnng t|i| tijgh-res image to the desired 
size. This will become the ba^JeveL:;!^'''"'''^^^!^; 

13. Any nxn sub-tile of the high-res : im : ||e can be ; :§3sed as a detail-texture. 
This sub-tile should preferably be ir^ifie <$0~ match across s and t 
borders so that whelm ^ repeated oA ; K&l>"ase-texture, the seams are 
not visible. Detail tex||f|§:jcan have a different texel type than the 
base-texture (subject ioTrhefe restrictions). Often, it is sufficient to use 
a 4-bit or 8-bit intensity detaU-tfexture 

A very effective and efficient implementation of detail texture involves use 
of the base texftiMfjsclf as the detail texture but at a different resolution. This 
works welhipr objec1s : - : an;d ; terrains with a 'fractal character' where different 
resolutions'olthe orjfect loblf Similar. In such cases it might be appropriate to 
set the niinjetiel parameter f§§0 to allow the detail texture to completely 
replace the base texture atj§||h magnifications. 

Sjr^the detail texture is combined with the base texture, a color shift may 
t:psul| ; ;lThis can be avoided by choosing the detail texture color scheme to 
"match the base texture colors so that this effect is rninirnized. The minjevel 
parameter. can also be used to keep the detail texture from completely 
replacing-^| base texture by setting it to a value greater than 0. This will 
cause a certaftj. minimum amount of the base texture to always be blended 
in with the detail texture thus minimizing the color shift. 

flhe shift field of the tile pointing to the detail texture is used to shift the 
incoming s and t coordinates before indexing into the map. This shift then 
detlrmines the base-texel to detail-texel ratio. 

iplr example, if the detail tile's shift was set to shift left by 1 (the shift of the 
finest level of the base texture being 0, of course), each base-texel, upon 
magnification would display 4 detail-texels blended with the base-texel 
color. A shift left of 2 would result in 16 detail-texels per base- texel and so 
on. Larger shifts result in more aliasing in the detail- texture since the 
interpolation occurs between widely different magnifications. 
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Keep in mind that the shift values comproinls^feetW/eerL thehase-texei to 
detail-texel ratio and the effectiveness of the bueffi : bperatif p on the detail 
texture. This is because the number of fractional bits in the s and t 
coordinates (sl0.5) is limited to 5.hits^:Wence, a shift left of 3 bits will leave 
only 2 bits of fraction within eaeh ; texei;$p do the bilerp. 

Detail textures must always be pointed' to using PRTMJTILE. 

Figure 14-7 MIP Map With Detail TextuieTile Descriptors 

* 3 , Tifiiip^' Prim_Tile=l 

MIP Map pyramid, with ^|i| ' Max_ievel = 4 

Lod_en = 1 

D M—--^^—^ 4 Sharpen = 
"^%i*5 3 DetaiLen = 1 



I I -1 2 

A> 7 . -. ] 3 1 

I f "%^ ^ ' -2 

'* .11 o 



p i?tail_en is true and the LOD is less than 1.0, indicating that the LOD is 
' befSk the finest MIP map level, the fraction is a table lookup of the IJrac. 
Curf||itly, the table lookup is simply identity, so the fraction is not modified 
in detail, mode. In order to always to have a portion of the base-texture 
visible, ?|$$?c is clamped to be greater than minjevel. Minjevel should be 
determined by experimentation. This fraction can then be used to 
interpolate between the detail-texture (pointed to by primjtile) and the 
base-texture (pointed to by primjrile+l). Filtering within the detail-texture 
Igan be controlled as usual by using the setOtherModes bits to be POINT or 

Ijlere 

PSharpen Mode 

Sharpen mode is used in a situation similar to that of detail texture. The 
advantage of sharpen over detail is that sharpen is essentially free. It doesn't 
require an additional detail map. Instead it extrapolates using the two finest 
MIP map levels. An image with high contrast edges has been magnified to 
the point where the edge details are becoming blurry. Sharpen mode 
increases the apparent sharpness of the texture edge by inverting the IJrac 
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(extrapolating) as shown in Figure 14-8, / 'Sharpen ::: p^a|)Siatio^|' on 
page 238. 

Bilinear Filtering and Point Sampling 

The DP hardware treats texture cc||^inate^'ilirfe*ii|j:ly based on whether the 
DP is in point sample mode or biler^mj^ft*. In poinf jsample mode texels can 
be thought of as 1 x 1 squares with the s|rhple pojftt at the top left hand 
comer of the texel (where the 's' and Y cOctMirtallaxes run left to right and 
top to bottom respectively This means that to map a modeler's floating 
pomt texture coordinate oy|j5u|(:u,,v) into the DP fixed point texture 
coordinates (s,t) for say a €2x32 sized texture (s ranges from 0-31 and t 
ranges from - 31), the mappmg 

s = u*32; ..^.^ 

t = v*32; .^ipllia. 

would worfej:|onsisteptly ;: arid;: would map the full 32x32 texture onto a 
polygon witrlpvv) cctordina^S in the range [0.0-1.0]. This is because the 
above mapping would resulpn u range of [0.0-1.0] to be mapped to an s 
range of [0-32] which wouflr cover the region from the left edge of the texel 
tijffitie right edge of texel 31. 

\0n the tether hand, in Bilerp mode the DP treats a texel as a 1 x 1 square with 
:: the sample point at the center and the above mapping would cover the 

region frdl|l§he middle of texel to the middle of texel 32 which goes beyond 

the extent of:|he texture. 

The mapping 

:i||k u*3 2 - 1; 
t"'l%y*32 - 1; 

djjlph't work either since it maps a (u,v) range of 0.0 - 1.0 to an (s,t) range of 

iil8 - 31.0 which would cover a region from the middle of texel to the middle 
of texel 31 which cause both texel and texel 31 to be half displayed. 

The mapping that would make the textured primitive match exactly to the 
artist's rendition of the texture in Bilerp mode would be: 

s = u*m - 0.5; 
t ~ v*n - 0.5; 
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since this would map a (u,v) range of [0.0-1. 0J%;a^#t) ran^Jof [-0.5 - 315] 
which would cover the region starting on the left ecfge of tejiil to the right 
edge of texel 31. However the bilerp filter requires tw$ii*8is to bilerp 
between and in the s,t ranges [-OJf iQJ 2 and [31.0 - 31.5] there is only one 
texel available. This can be solgpl by turning on clamping in the DP and 
setting SL,TL to 0,0 and SH,Tr||> Sl^ffi^lj cause the bilerp filter to 
select texel for both texels to Bilerp- between in fhe range [-0.5 - 0.0] and 
texel 31 for range [31.0-31.5]. This paf|digm carfbe extended for wrapping 
textures by clamping only at the bordII;^ordmates of the primitive. For 
example a primitive wj|iu.,v in the range [6.0-4.0] m wrap mode would 
repeat the texture 4 times. For the border texels to be displayed in full the s,t 
range would have to blf-0.5^T27.5] (accord mg to the above mapping) and 
the clamp parameters SL,TL an liSjljgH would be set to 0,0 and 127,127 
respectively. (Note that SL and TL is Subtracted from the incoming texture 
coordinates ;3$$#s also used as *h e iower clamp value in clamp mode). 

If the (poller of 2|#xrure :: , matches along the 4 edges, clamp can be turned 
off and tnljilerrJilter wljlise the texel from the other edge of the wrapping 
texture to filter to. M? 

4J$&e: Since point slffrp&d and bilerp modes cause a shift of 0.5 texels in the 

i displayed primitive, to switch between point sampled and bilerp modes 

'' wiijiput shifting the texture one of the following methods may be used: 1) 

use I%fferent primitive with a 0.5 shift in the texture coordinates; 2) Set the 

0.5 texfkshift in SL and TL in the texture tile (SL and TL are subtracted from 

the mcorhing texture coordinates) 

Note: If the mxn texture is too large to fit in tmem, the polygon and the 
texture can be broken up along u,v and s,t in appropriately sized tiles. For 
ggie bilerp to work along the tile boundaries, an extra row (or column) of 

texels around each tile border needs to be loaded i.e the resulting polygons 
Jlill be disjoint but each tile (that is not a border tiles) will have an overlap 

§?6f 2 texels with any adjacent tile. 
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Figure 14-8 Sharpen Extrapolation 



u 



Magnify interval 

A 




t 



.0.4; 



B 



L, texeis/pixel 
L Tile.LFrac 



The chanp;in color between texel A and B is extrap- 
olated jgjjffg the equation P = A + (B-A)*(Lfrac-1.0) 

Noldetfiat the extrapolation makes the dark texel 
even darker... 

■and light texels become lighter after the extrapola- 
tion, thus enhancing the apparent sharpness of the 
edge. 



Magnify interval 
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Texture Memory 



Memory Organization 

Because texturing requires a large amount of random accesses with 
consistent access time to texture memory, it is irhpractical to texture directly 
irom DRAM. The approach taken by fljfe Ninte|ipo64 system is to cache up to 
4 KB of an image in an on-chip, high-speed; tex-ture memory called Tmem. 
All primitives are textti|^cl,usmg the contents of Tmem. The basic sequence 
of events needed to texture a primitive is: 

1. Load a texture tile into Trneife ; ,. 

2. Describe attributes of the textureTtile. 

3. Rendef ; priiriltives that use this tile. 

Tmem should in<f£ eat he&Jnsidered a cache from the programmer's point of 
view. Since each tile musff fee loaded from DRAM, it makes sense to render 
as many primitives as possible, using the current tile before loading the next 
one in order to cotiseWM DRAM bandwidth. 

?Ph'f|i.cally, Tmem is arranged as shown in Figure 14-9. LO-3 are referred to 
as tSfedow half of Tmem, HO-3 are referred to as the high half of Tmem. 

Figured 4-9 Physical Tmem Diagram 
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For loading, Tmem is arranged logically; as show&in^Fijftfre 14*10. 

Fi g u re 1 4-1 Tmem Loading .m-^W 

Load Data .„„*■*„.. 



Alignment Logic 



Load Address 



64 bits 



Tmeil 



512 ^t 



J Y 



The following table shows the maximuirferile sizes that can be stored in the 
4KB Texture Memory. Images larger than this will be tiled. 

Table 14-8 M.Jirnum tMj&gf&jn TMEM 



Texel Type 



4-bit (I, IA) 

4-bj^olor Index 

8-bit (T% 
3-bit ColorTndex 
16-bit RGB A 
16-bit I A 
ggg£it YUV 
32-bit.RGBA 



Maximum Texel Count 



8K 

4K (plus 16 palettes) 

4K 

2K (plus 256-entry LUT) 

2K 

2K 

2K Y's, IK UV pairs 

IK 
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Four-bit textures are stored in Tmem as sho#ri, ; as.shbwn in Figure 14-11. 
Figure 14-11 Four-Bit Texel Layout m Tmem .$#?*■' 
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Eight-bit textures are stored in Tmem, as shown in Figure 14-12. 
Figure 1 4-12Eight-Bit Texel Layout in Tmem 

8-Bit: Texture, 10 texels per row, texel indices are in hex 
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Sixteen-bit textures (except YUV) are stored in T^eiiu .a^&howi^jn 
Figure 14-13. -^-- j^ 

Figure 1 4-1 3 Sixteen-Bit Texel Layout in Tmem *fc#w 
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Sixtei§||-bit YUV texttxres c at e stored in Tmem, as shown in Figure 14-14. Note 
that YUV texels must be loaded in pairs. In other words two Y's at a tune. 
<J§l'o nofe that if filtering is enabled, an additional UYVY pair must be 
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loaded per row and SH set accordingly to al^poJDer filtering of the last 
UV texel per row, 5 _ ^£;^ 

Figure 1 4-1 4 YUV Texel Layout in Tmenx 
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:Br||irty-two bit (RGBA) textures are stored in Tmem, as shown in 
r Fl||ire 14-15. 

Figure 1 4-1 5 Thirty-Two Bit RGBA Texel Layout in Trnern 
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For color index (CI) textures, the texture is store||^ ; t^%Miver E|lf of Tmem, 
and the Texture/Color Look-Up Table (TLUT) is stored in the upper half of 
Tmem. For 4-bit CI textures, the texels (or indices) addressed in the lower 
half of Tmem have the 4-bit palette auj^kgr for the tile prepended to create 

an 8-bit address into the upper half of Tmem. Since four texels are addressed 
simultaneously, there must be fo|f (usuali|:::||l;||l||cal) TLUTs stored in the 
upper half of Tmem across the fo^|;:^an|^f ' 

For 4-bit CI textures, the palette effective|| ; ;Seleg|i6ne of sixteen possible 
tables, each table having sixteen entries. EfMlaf le is aligned on 16-word 
boundaries. Note that mere^rwo choices for the texel type that resides in 
the TLUT: 16-bit RGBA, or 16-bit IA. The type is selected using the 
gDPSeiTextureLUTO command. Tftllg|irtmand also configures the Tmem as 
shown in Figure 14-16. Because of this; CI textures cannot be combined with 
other texture tvneg.in two-cycle mode. 
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Figure 14-16Tmem Organization for Eight-Bit Color IndexT^tifes 
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Tight-bit CI textures do not use the palette number of the tile, since they 
address the whole 256 TLUT directly. It is possible to use the 8-bit mode for 
storing index textures that have between 16 and 256 entries. 

For example, you could define a texture that had 40 entries, numbered 0-39, 
and load the TLUT into the upper half of Tmem (word 256). Further suppose 
that you had another texture with indices 40-69. You could load this texture's 
30 entry TLUT into Tmem, starting at word 296. 
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Assuming that both textures together fit into tn%|pyyer,lpif of Imem (2 KB), 
these textures could be co-resident in Tmem. It is1IS#pbssiblj||b have CI 
textures co-resident with other non-CI textures. M'Bk 

In the above example, you are using only the first 70 words of upper Tmem 
for TLUTs. You could use the remaining l|I:^i|c|§: to store a 4 "^ jt * texture ' 
for example. Note that even though you can store CI and other types 
together in Tmem, you cannot access these types simultaneously in 
two-cycle mode, because the configuration of the .Tmem for CI textures is 
controlled with a mode bfyhat must be updated using the 
gDPSetTextureLUT comma||j|gs mentioned previously. 

Figure 14-17Tmem Organization fdlfaur-Bit CI textures 
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Texel Formatting 

In the RDP graphics pipeline, most operations are done on 
8-bit-per-component RGBA pixels.-; After looking up the texels, the texture 
unit converts them into the 32-bit RGBA format. Table 14-9 describes how 
each type is converted. The format for j^aBlfgd descriptions is [MSB:LSB] 
where MSB is the most signmc^t,hi±;:ajid LSf&s|- the least significant bit. Bit 
fields are grouped together in b races || with th&;ihost significant field on the 
left and the least significant field on th^r^ghfgif 

Table 1 4-9 Texel Output"j|?||t|ttirig 



Type 


Size 


Inptff 
Format 


Output format 
Red "'-'"Green 
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[3:0]} 
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Loading a texture actually consists of several steps. Internally; the RDP treats 
loading a texture as if it were rendering a textured rectangle into Tmem. To 
load a texture, you must describe |fie textute tile. ..to be loaded, render (load) 
it into Tmem, and describe the tile |o be re^e'relj|:An important 
consequence of these steps is that ; ^§Sgli; load a tgxture m one way and 
render it in completely different way: < s . M 

For example, the GBI mait^gsDPLoadTextuWfue performs all the tile and 
load commands necessaty^^ad a texture tile. The sequence of commands 
is shown below (macros snown without parameters): 

gs DP Se tTex tur elmage 
gsDPSetTile /* ] G_TX_LOADTILE */ 

g s DPLoadSyr^ <?^>8b ; . 

gsDPLoadT i ; f §"'' I * "(3||TX_LOADTI LE * / 
gsDPSetTiifi /* G4|&il#|D£RTILE */ 
gsDPSetTilliize W G_TJ^||ENDERTILE */ 

This sequence of command'sToads a texture tile using the tile descriptor 

GJ||LOADTILE (tile" 7f and renders using G_TX_RENDERTILE (tile 0). 

SiilEe' the tile descriptor used to load the tile is different from the one used to 
jfende'rllhe texture, there is no possibility of tile usage conflict, so a TileSync 
j -command is unnecessary. The TileSync command is used in situations where 

you may #ant to use the same tile for both loading and rendering a texture. 

It basically ihs.erts a bubble in the RDP pipeline to guarantee that the load 

tile descriptoiiisn't changed by the render tile before the load is actually 

done. 

ii WtQ i gsDPSet7extureImage command sets a pointer to the location of the 
image. Then the gsDPSetTile is used to indicate where in Tmem you want to 
place the image, how wide each line is, and the format and size of the 
Jepure. A gsDPLoadSync command makes sure that any previous load is 
Completely finished before this texture is loaded. Then the actual 
gsDPLoadTile command is issued, which loads the texture into Tmem. The 
final gsDPSetTile and gsDPSetTileSize are used to set the tile descriptors 
correctly for the tile used when rendering. 
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The textures are stored big-endian in memoi^^d^ould obey the 
following format for a 64-bit word in memory. 

Figure 14-1 8 Texel Formats in DRAM s3:ig ,, fe 
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The LoadTik command allows a programmer to load an $j$00try 
rectangular region of a larger texture jn ;; DRAM into Tmem. The following 
examples assume a 16-bit texel type. 

Figure 14-1 9 Example of LoadTile Cc^nand |§iamei<3|§ :; 
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Assume texel Size 5 is 16 
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Tile to be loaded using LoadTile command 
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...ESefi-line of me tile, for example, rexels 140-159. 
1*111 be at least one DRAM transfer. The advan- 
tage BfjiLoadTile is that you can load arbitrary tiles 
from a larger map. 



SL = 20, TL = 2, SH = 39, TH = 7 



When texturePare loaded as a tile, it means that (at least) each line of the 
texture is a separate DRAM transfer. Each line's transfer may be broken into 
lHajltiple smaller transfers, depending on how big it is and whether it crosses 
DRAM page boundaries. Since the DRAMs are block transfer type devices, 
thesis a fixed amount of overhead for each transfer, so long transfers are 
deslilible. For this reason, you should try to load your texture using the 
: longest dimension of the tile. Also, each line of a tile is padded 
"automatically to Tmem word (64-bit) boundaries. If your tile line size is not 
a multiple of 64-bits, some Tmem space is being wasted. Also when tiling a 
larger texture image into multiple tiles, an extra row and column are usually 
loaded to allow proper filtering of the texels along the border of the tile (to 
avoid seams). 
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Note: The RDP commands LoadTile, Load£|r^, : .an§T.oadTDlJT set the tile 
parameters SL,TL,SH,TH when they are executM" After the jbad command, 
it may be necessary to use the SetTileSize command to restore these 
parameters if you want parameter^ other than were used in the Load 
command. In the gbi.h texture foM macros, the SetTileSize command is 
always used following a Ix)ad^mmar|'ii^'^ ; -;?§f^|; . 



Wrapping a Large Texture Using Load Tile 

It is possible to effecti\«ij^,'wrap' large textures (textures too large to fit 
entirely in Tmem) by car|ill|pading using the LoadTile command. There are 
(at least two) methods for doing; this. Figure 14-20, "Wrapping a Large 
Texture Using Two Tiles," on papifl shows a large texture in memory. We 
want to load a tile as if the texture haS been wrapped in the S direction, and 
the tile straeitltethe wrap region. 

Fi gu r e 1 4?20 WrapgkgiaUai 



?e Texture Using Two Tiles 



;n 



Large Texture 



Tile 2 



CLJE 
«6£ 
120 
18C 
24C 
300 
36C 
420 
480 
540 



=F 



JXL 



^59 

Tll9 

179 

239 

299 

359 

119 

J 479 

.539 

599 



Tilel 




60 
120 
180 
240 
300 
360 
420 
480 
540 



Wrapped Large Texture (Virtual) 
m n 







< ^ — - 


^ 


§ 








.-:■■ :, -.:r 













Tile we would like to load 



NU6-06-0030-001G of October 21, 1996 



251 



NINTENDO 64 PROGRAMMING MANUAL 



DRAFT 



One way to effectively load the wrapped tile is t5 ; ;%fcu;|iy loadjiwo 
interleaved tiles. To interleave two tiles in Tmem, load tile 1 but set the tile's 
Line parameter to n-m Tmem words, where n is the number df words in a 
line of Tile 1 and m is the number o£:^|i^;rn tile 2. SL,SH,TL/TH should be 
set to load Tile 1. Now load Tile 2, c setting tie tile's Tmem Address to n. Also 
set the SL,TL,SH,TH for Tile 2. Af§| the la|Si; ; -ri|f | the render tile's Tmem 
Address to 0. Set SL,SH;TL,SH to feitheilltal compsite tile size. Note that 
only Tile l's width must be a multiple of T|nem wojfs. Tile 2's width can be 
any number of texels and the remainder of ^eTssSimem word for each line 
will simply b e undefined . ■fgl^ 

Another, possibly more straightforward method, relies on the fact that at the 
end of each line of the large texture, fif||||dresses will naturally roll into the 
next line. 

Figure 14-21 Wrapping a Large Texture Using One Tile 
■;:■'£ ■ i^^'^'^^-yj-aree Texture 




bogus texels 
at end of tile 



bogus texels 
at start of tile 



This is one contiguous 
line 



So,4i> shown in Figure 14-21, "Wrapping a Large Texture Using One Tile," 
onjipage 252, you can load a single tile starting at address 60 minus m words, 
ilflte tile's Line parameter should equal m+n. Set the Tmem Address parameter 
to during the load. Make sure to load T+l lines. After the load, set Tmem 
Address to m, and set the SL,SH,TL,TH to the actual tile size. This method 
wastes m words at the beginning of Tmem and n words at the end of Tmem 
but has the advantage of using only one load. 
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Load Block 

A more memory-bandwidth efficient way to load textures the LoadBlock 
command. This command essentially .treats each texture as a single long line 
of data. This allows the MI to trahsfer u%maxjmuxn amount of data for each 
transfer. 

Figure 14-22 Example of LoadBlock Command Parameters 



,Texel Offsets in DRAM • 
44 m- 



132 
176 
220 
264 

306: 

352 
B96 



—0>- — i 



' Actual Texture Line Size = 42 texels 



Memory will be accessed as ofifcontinuous line of 
te xels fro m - 4 3 9 The. : linei; :nurn berisdeterrmned 
; in texture hardware by accumulating dxt. DRAM 
!|ransfers will be the largest possible considering 
l||an buffer size and page crossings. 



C 



J 



43 

■87 
131 
175 

■219 
263 
307 
351 
395 
439 



Pad each line by 2 texels to 

get integral 64-bit words per line 

dxt - 1 line 4 texels = 1 
44 texels 1 word n 



The LoadBlock command uses the parameter dxt to indicate when it should 
start the next line. Dxt is basically the reciprocal of the number of words 
(64-bits) in-line. The texture coordinate unit increments a counter by dxt 
for each word transferred to Tmem. When this counter rolls over into the 
next integer value, the line count is incremented. The line count is important 
^because the data in odd lines is swapped to allow interleaved access when 
'•^kdering. This works great when dxt is a power of two. However, if dxt is 
not a power of two, the line counter can be corrupted due to accumulated 
,|||ror. Appendix A contains a table that indicates how many lines for a 
Certain size can be in a load block for a tile before the line count is corrupted. 

It is possible to load a set of texture tiles using a single LoadBlock command 
(MIP maps, for example). However, if the tiles have different widths, the 
single dxt parameter is not enough to do proper interleaving. In these cases, 
the data must be pre-interleaved and the dxt parameter should be set to zero. 



NU6-06-O030-001G of October 21, 1996 



253 



NINTENDO 64 PROGRAMMING MANUAL DRAFT 



The LoadTlut command is an efficient way of loading texture look-up tables 
into the high half of TMEM. System memory is conserved using this 
command as each 16-bit color value is "quadricated" asitifi&'fead in and 
written to the TMEM. In other words-;. it Isn't necessary to store four times 
the data m memory. The load hardware wi|l expand it out mto a 64-bit word 
during the load. This saves systeM;memo|pal%e|l as memory bandwidth. 
Two types of TLUTs are supportei;|6^i;8GBA agd 16-bit IA. TLUT depth 
can range from 16 words (4-bit CI) to 256: words (jlbit CI). LoadTile or 
LoadBlock can still be used for loading md;;Tl^ ; ^iowever the data will have 
to be quadricated in systqhfijemory first/'*™*'"" 

Loading Notes Iff ^|f &.■,„. 

4-bit types should be loaded as 16-bit types and then rendered as 4-bit types. 
This does not restrict 4-bit types in any way and still allows for rows with an 
odd number|jf 4-§i||exels. 

When usmgimdBldci, no mf|e than 2048 texels can be loaded at once. So for 
example if you wanted to IqlJ 4K 8-bit texels, load them as 2K 16-bit texels 
and then render them, as J#it texels. If you're using 16-bit or 32-bit there is 
nojfied for a speciallSie^iince TMEM cannot hold more than 2K 16-bit or 
IK 32-bit texels. 

To improve performance by minimizing the number of syncs required, the 
user can^||terieave the tile loads and renders with different tile indices. For 
example, lb-ad using tile 7 while rendering using tile 0. 
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After texture coordinates axe converted to THe Space, tH&y; may be wrapped, 
clamped, or mirrored. Figure 14.-23 shows how wrapping, mirroring, and 
clamping affect the tile-relative coordinates. The S and T coordinates have 
independent controls for wrapping, ir||EOri|igg>and clamping. 
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Figure 14-23 Wrapping, Mirroring, and Clamping 



4JLM 


::? WrapS,T 




Mirror S 
WrapT 






♦♦ 


Mirror S,T 








Clamp S 

WrapT 


-tf^^^^H 








MM 


WrapS 

Clamp T 



Base Map 
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Figu re 1 4-24 Wrapping Within a Texture Tile ''%wt;#0^ 

Textured log using 3 textured: cylinders. The 
middle cylinder sets the tile's mask to 6 so 
j;|l|l||Jhe texture wraps every 64 texels. The 
end: cylinders set the tile's clamp bit and have 
coor||tiate%;that access the jagged part of the 
„ ...texWre. Advantages include easier modeling, 
' : us;^ of one loalf command, and possibly 
tighter Tmehj; packing than if two separate 
texfaresfisVere used. 




74 clamp °5 



Wrap every 



128 



.*" clamp 74 
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Figure 1 4-25 Example of Texture Decals 



Airplane Wing Insignia, 
Cycle 

Alpha at edges of 
insignia 

Mirror s,t 
Clamp s.t 




Airplane Wing Camo, 
Cycle 1 Jp^'l 

Wrap s,t . §;! 
Mirror s,t 




Airplane wigjg camo and 
insignia combined in Color 
Combiner using the insignia 
alpha to lerp between the 
Igamo and insignia color. 
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Texture Types and Modes 

The following is a list of restrictions cgpe6'MSg,the use of certain textures 
types in certain modes: 'V§m:0M 

Point Sample :fa , 

Clamp &/ ! (wrap I mirror) works for all texel types. 

Filter 

Clamp worksiKf;all texel types. Wrap t I mirror t t (clamp t & wrap t) I 
(clamp t J|nurron;|) ; WQrks for all texel types. 

Wrap s I mirror s ! (clairif§s & wrap s) ! (clamp s & mirror s) works for all 
texel types except YUV.jff-' 

Copy 

Claiming is implicitly disabled for copy mode. 32-bit RGBA and YUV texel 
types are not supported. To copy these types, they should be loaded and 
copied ' : l§ul 6-bit RGBA type texels. When using a 16-bit RGBA type to copy 
a 32-bit RGBA or YUV texture, mirroring in s is not supported. 

Wrap or mirror works for 4, 8, and 16-bit types. 

LOD 

jfbu must put the RDP in two-cycle mode to use texture LOD. 

Alignment 

The texture image pointer, as defined using the gDPSetTexturelmage 
command, must be 8-byte aligned. Additionally, each tile must be aligned 
according to its size. For example, 8-bit texture tiles must be aligned to 8-bit 
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boundaries, 16-bit textures to 16-bit boundaries, etc. One exception is 4-bit 
tiles, which must be accessed on byte (8-bit boundaries). $/ 



Tiles ' r ""'' : *t| _,, 

The maximum size of a tile is 256 r^s ; ;|t5eoordinali) and 1024 texels (s 
coordinate) within the limits of Tmem size. It is better to always make the s 
coordinate the longer coordinate m ternis:^fload||?erformance. 

You should avoid shifting f^§f||mates left usmg the shift parameter of a tile 
unless necessary- See the example under Multiple Tile Effects in the 
Applications section. 



Coordinatgpiffge 

The valid texture coordinate range is currently from -1024.0 to +1023.99. A 
total range of 2K texels across a primitive. The texture hardware can handle 
this full range without -.any; Noticeable loss of accuracy For small coordinate 
rar^# however, if given'! choice of coordinates close to zero or coordinates 
cl^se% 1024, slightly higher quality may result from the lower coordinates. 
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Multiple Tile Effects 



Interference Textures 



Since you can access two separate tile}!|n two-jfjple mode, it easy to achieve 
interference pattern effects. Of course, yo^eJaN use textures that are different 




X 





sizes (wra-pp>n different iS^rvals) to decrease the amount of apparent 
repetition. This is especially useful for textures on terrain or for waves on 
the ocean, for example. 

Lighting with Textures 

Multifile tiles can be used for lighting effects. In the example below, a small 
texture%repeated many times but a small light texture is scaled up to create 
the effect of a spotlight. In this example you could use the input coordinates 



;x Tex 1 



Tex coordinates 

Tex 1 coordinate 




200,0 
50,0 



200, 50 
50,25 



should be defined using Tex 0's coordinates. The shift parameter of the tile 
descriptor for Tex 1 could be used to right shift the input coordinates to the 
required values. It would be a bad idea to use Tex l's coordinates as the 
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input coordinates and then left shift to obtain t^^l'&Goterdinat^S. This is 
because when you shift left, you shift zeros into tnf 1sD r s of thjlpoordinate, 
thus losing precision. *g :; 0'" 

Extended Alpha Using Multiple Textures 

The 16 bit RGB A texture type is often^u^J to textile! sprites and billboards 
because this is the only type that allows aflarge number of colors. 
Unfortunately, this type only has one bit oT;|hjhp| which means you cannot 
prefilter texture edges), ar||||ri lead to pixeiated texture edges. 

One way to get more bits df • alpha- fti^qrder to create smoother outlines) is to 
use two tiles. The first tile describes the RGB color of the texture, while the 
second tile describes the alpha channel of the texture. Render the texture in 
two-cycle mod.e.;<fe$he color combiner, select TO as the source and in the 
alpha combiner selec|jjl as the source. 

A code fragril^t indicates hi|§ to set the combine modes and load the 
textures: 

idefine MULTIBl^^^HA 0, 0, 0, TEXELO, 0, 0, 0, TEXELl 



|/* use special combine mode */ 
||sDPSet:CombineMode (MULTIBIT_ALPHA, G_CC_PASS2 ) , 

* v |i:oad alpha texture at Tmem = 256, notice I use a 

* different load macro that allows specifying Tmem 

* address . 

* / 
_gsDPLoadTextureBlock_4b<l4molecule, 256, G_IM_FMT_I, 

32, 32, 0, 
G_TX_WRAP, G_TX_WRAP, 
5, 5, G_TX_NOLOD, G_TX_NOLOD) , 

/* 

* Load color texture starting at Tmem=0 
*/ 

gsDPLoadTextureBlock (RGBA16molecule, G_IM_FMT_RGBA , 

G„IM_SIZ_16b, 32, 32, 0, G_TX_WRAP, G_TX_WRAP, 
5, 5, G_TX_NOLOD, G_TX_NOLOD) , 
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/' 



* Since normal load macros use 'tile fdr render, I 

* need to set tile 1 manually to p^£|jt-# : at alpha 

* texture . ■recas&A* 

*/ Jt"'' : ' :: ^#:: 



gsDPSetTile (G_m_jl$gJL 
0, 

0, 0, 0, 
0, 0, 0^, 

gsDPSetTileSiseJ 1, 0, 



|l.4b, 2, 256, 1, 



2, 31 « 2) , 



/* make sure in two'- : dS?ele mode */ 
gsDPSetCycleType ( G_CYC|lcYCLE ) , 
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Appendix A: LoadBlock Line Limits 



The table below lists the maximum number of lines thatt^h'oe properly 
transferred for a given texture width MM§:t-- 

Note: The absolute max lines column refers to the number of lines that could 
be transferred if only limited by f mem size. If absolute max lines field is 
empty, it indicates that the max lines wasMgual to Jf solute max lines. If max 
lines is empty it indicates that zero lines could be transferred correctly using 
these parameters. 

This table only applies to 16-bit te^eig. 

Table 1 4-1 Limits on Number of Lines for IbadBlock Command 



Width 


Max 'Lines.. 


Absolute 


(16btexels) 




Max Lines 


4 


11 




8 


256 




12||% 


170 




W '%.. 


128 




20 


|102 




24 






28 


73 
64 
56 




J&lr 


51 




44 


20 


46 


48 


42 




52 


26 


39 


56 


14 


36 


60 


19 


34 
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Table 1 4-1 Limits on Number of Lines for LoaSfllo^k Cp v 

Width Max Lines Absolute 

(16b texels) Max Lines, ,»«,:,,. 



64 


32 




68 


13 


30 W:;. ;,....,, 


72 


2S 




76 


26 




80 


8 


.^^&b„ 


84 


9 


24 


88 


4 


23 


92 


.sip 4 


i .^ 22 


96 


W\, 5 


■''2/iv 


100 


"*20 




Ifi/: 


13 : '* ! 


•■^''■''19 


§08% 


IS 




112 '% 


3 


18 


116 


III. 6 


17 


120 


3 


17 


VIA 


2 


16 


: fes 


16 




lil-136 


2 


15 


'.140 


3 


14 


144 


14 




148 


2 


13 


152 


13 




156 


2 


13 
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Table 14-10 Limits on Number of Lines for LoadBlo^Eommand 



Width 


Max Lines 


Absolute 


<16btexels) 




Max Lines 


160 


1 


12 


164 


12 




168 


4 


12 


172-184 


2 




188-192 


2 




196 


4 


10 


200 


10-^-av". 




204 


Jf- 'IS, 




208 






212 


2 




216;... 


9 




.220 


— 


9 


224 




9 


228 


#&. 




232 


— 


8 


; ;236 


2 


8 


240; 


— 


8 


244'- 


1 


8 


248 


— 


8 


252 


1 


8 


256 


8 




260-264 


— 


7 


268 


1 


7 
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Table 14-10 Limits on Number of Lines for LoadBft|cic 



Width 


Max Lines 


Absolute 


(16btexels) 




Max Lines 


272 


— 


7 || 


276 


1 


7 ''%:'; ■ : ;! 


280 


— 


7 


284 


2 


7 


288-292 


— 


ilii' '"^'VS^' 


296 


1 




300 


iCS&jft*. 


6 


304 ;| 


1% 




308-312 


||s~ w 


6 


316 


4 


■Mr' 


jjj|324 


~ 


*" 6 


(§28% 


6 




332-34(5;:.. 


— 


6 


344 


IK 1 


5 


348-356 


'* — 


5 


,360 


1 


5 


l ||4-372 


— 


5 


jP 


1 


5 


'"380-388 


— 


5 


392 


2 


5 


396-408 


— 


5 


All 


1 


4 


416-428 


-- 


4 
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Table 14-10 Limits on Number of Lines for LoadBlofekCoHiiftnd 



Width Max Lines Absolute 

(16btexels) Max Lines 

432 4 

436-452 — 4 

456 4 

460-480 — Ijp^ 

484 i Jr : ^t|f-x 

488-508 — 4 

512 4. .., s ,., :i:/i 

516-544 0- :,-|||L 

548 V %- :: W 3 S -|| 

552-584 — 3jf 

5m§h 1 " ! «^T 

, '592-628 — 3 

632 '1|t 2 3 

636-680 '%-- 3 

684 ?'' 

J88-744 — 2 

!, ^§! : 2 

752-816 — 2 

if 6 2 

824-908 — 2 

912 2 

916-1020 — 2 
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Chapter 15 



Texture Rectangles (Hardware Sprites) 



Warning. Code fragments in this chapter have not been fully verified. 
A demo containing these examples will be included in a future software 
release. 

A texture^ctangfe'is a sjpcial primitive supported by the Reality Display 
Processor (RDP) hardwa:rJ|?This primitive is intended to provide simple 
'sprite' cap abilities, with : :a : hiinimum number of parameters. Texture 
rectangles are screer^aligned rectangles whose coordinates are defined 
fJliMctly in screen space. 

Exafrtple 1 5-1 Texture Rectangle Command 

g s DPT ej^ureRec tangle (xl , yl , xh, yh, tile, s, t, dsdx, dtdy) 

Texture coordinates are defined by specifying the start point S and T 
coordinates at the top left comer of the rectangle and the step in S per pixel 
in X and the step in T per pixel in Y. Example 15-2 shows a rectangle 100 
pixels wide by 100 pixels high drawn at screen coordinates (100,100). The 
future coordinates at the top left comer of the rectangle are (0,0). The 
j|$cture steps 1 texel per pixel in both the S and T directions. This example 
plsurnes that a texture has been previously loaded (see "Texture Loading" 
on page 248). 

Example 15-2 Texture Rectangle Example 

gsDPSetTexturePersp{G_TP_NONEJ , 
gsDPTextureRectangle(100<<2, 100<<2, 200«2, 200«2, 

G_TX_R£NDERTILE , 

0, 0, 

1«10, 1«10), 
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Caution; The perspective divide of the textureliar:(^f tes in Ihe RDP must 
be disabled using the gsDPSetTexturePerspQ command whe%f§hdering 
texture rectangles. «lfiK^ : 

Texture rectangles are two-dirneri^lnal (^)~ they may be translated in X 
and X but not rotated. Texture rectangles may be z-buffered in a limited 
way as described in "Z-Bufferin^ Texture Rectangles" on page 299. Even 
though they are simple and limited to two dimensions, texture rectangles are 
useful both in 2-D sprite games as well as for 2-D.it!fects in 3-D games. This 
chapter will explain some$| the details associated with the texture rectangle 
primitive and provide some simple examples for new Nintendo-64 
programmers. Some of t^f inf^^^on found in this chapter may also be 
found in other chapters but is repea s fe&here for completeness. 

Figure 15-1 Texture Rectangle Definition 

rl^ScteeM^f;--;. 
0,0_ ^^'"JL* ^^20jgt , "--> 



•> xl, yl (10.2 fixed point) dsdx (s,5.10) 
s,t(s,105) 
dtdy(s,5.10) $ SSSSS 

Texture 
•> xh, yK (10.2 fixed point) 
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Sampling Overview 



A texture is an array of values, where each value is a set of numbers 
(components) describing the attributes. iSf a texture element, or iexel. For the 
Nintendo 64, the numbers repr^pntrngititexelare fixed-point. The number 
of components per pixel and ti^jurnb^rb'fBill^er component is variable. 
"Color Index Frame Buffer" on"'pa : g^t|8 descrifjSs the possible formats for 
texels. 4:1- 

When displaying a texf||||gpn the screen or a display, we must perform a 
mappmg from the texti^'sfjafg to the display image space. In the case of 
texture rectangles, where the gel!) the trie operations are limited toscalingand 
translation, the main problem is hc%7£p sample and filter the source texture 
so that it is faithfully produced on the 'display. Figure 15-2 is one example of 
aliasing ariii|£|sjthat can effect image quality. In this example, 10 black bars 
are separated by l§white bars with even spacing. The bars cover a width of 
11 pixels 'Hj| the sc|eeW:;: ; %cause we are sampling at a lower frequency than 
the texture^fpur otitput linage is aliased. Aliasing artifacts are caused by 
high-frequency informatifli that is insufficiently sampled appearing as 
low-frequency informatioh . Furthermore, if the beginning sample point is 
rrtoyed slightly, the sampled image can shift dramatically. During 
.animations this causes the displayed image to scintillate or flash. Nyquist's 
HLaw|f|idicates that the sampling frequency should be greater than twice the 
highest, frequency component in the texture to avoid aliasing artifacts. 

Figure 1Sf2 Aliasing in a Sampled Image 

scanline 

AAAAAAAAAAA 
_1 1 1 1 1 — J 1 1— J 1 1 - sampling points 




samples 



Point Sampling 

Point sampling in the Nintendo 64 means that we assume that each texel 

maps to one pixel on the display, and we ignore any fractional overlap 
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between texeis and pixels. Example 15-3 sho#s^QW,t4:;lnabl|: point 
sampling. 

Example 1 5-3 Enable Point Sampling;,^,. . 

gsDPSetTextureFilter (G_TF_P(p:|lJT) ,„,^ 



Point sampling works well for ma^^g£&Tectangf|ar texture to a 
screen-aligned rectangle of the same size on the display. Problems occur if 
the sampling ratio is not 1:1, however, as shown in Figure 15-3. In me first 
case, we display 10 texelsliisjng 10 pixels. In the second case, we scale the 
image slightly by displaying 9 texeis on 10 pixels. This results m the middle 
pixel having the same color as tl|;|>revious bar. In general, point sampled 
images should be scaled by an mte|p§j!c>wer of two to avoid this problem. 
To achieve other scalings, it is necessaiyio use bilinear filtering. 

Figure 15-3 E^ihi|arnphng Scaling Problem 

' :: --' :;: :;f lI;:l Scaling 
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Example 15^4 demonstrates 3 texture rectangles with the texture scaled by 1, 

2, and 4 respectively: 

Example 1 5-4 Scaled, Point Sampled Textures 
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gsDPSetTex tur eF i 1 1 er ( G_TF_PO INT ) , 0;:,- t ^0-' 

gsDPTexcureRectangle <50«2 , 50«2, 150^1 : ";" 15 Q<<2;.. 
G_TX_RENDERT ILE , 

0, 0, .iVteh. 

1«10, 1«10), J0 F? ^M, 
gsDPTex tur eRec tangle (60<-s# : , 6Q<^2^;;;|:6£<<2 , 160<<2 , 
G_TX__RENDERT ILE, 

o, o, ^m0§ II 

• 1«9, 1«9) , lf| 

gsDPTextureRectangle (70«2, 7otl%:;g4:f^<<2 , 170«2, 
G_TX_RENDERTILEi-- ; : •' i *SSS- J 

o < o < 

1«8, 1«8) , Jf ' W ' :c Sl::g,, 

Point sampling also implies that animated sprites will have to move in 
one-pixel increments. Even though the rectangle can be positioned with 2 
bits of su^pxef fgecision, and the texture can be offset to 5 bits of fractional 
precision ? i;the poiiilsarnpling only looks at the integer coordinate and so will 
not change, untuihere is'pdeast a one pixel change in position. Bilinear 
filtering allows for smoolfter motion of sprites. 

jjfjjinear Filtering : '" e * 38i5>: " 

InS||ad of selecting a single texel for a given pixel, as in point sampling, 
bilin ; ||i filtering selects four texels surrounding the sample point and 
intepotates these points using fractional position information to determine 
the pixel; feplor. Example 15-5 shows how to enable texture filtering. 

Example 1 5-5 Enable Bilinear Filtering 

gsDPSecTexcureFilter {G_TF_BILERP) 
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An example of bilinear filtering is shown in Fipji^l^fi 
Figure 15-4 Bilinear Filtering 
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top =TL + s_frac(TR-TL) 
bot =BL + s_frac(BR-BL) 

texel =top + t_frac(bot-top) 



3R 



Ir|; : me:: : Nintendo-64, rather than doing a full bilinear interpolation using all 
pSur samples, a triangular interpolation is performed that uses only three 
"points, "the texture filter selects which three points to use depending on 
where me|ample point lies inside the 2x2 grid of texels. In certain cases, the 
triangular filter can cause small anomalies. These cases occur when there are 
drastic intensify changes from one texel to another in the texture as shown 
in Figure 15-5, In this example, if the sampling point moves slightly from 
ggge side of the diagonal to the other, the resulting color changes abruptly. In 
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general, it is best to prefilter an image so thaf=ijt$$$f fiarp texture edges at 
least a slight intensity ramp. : . f „ v . ; >^ 

Figure 15-5 Triangular Filtering ^0^. x . 
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With bilinear filtering, it is possible to scale a texture without the problems 
of point sampling. Example 15-6 shows a texture rectangle with the texture 
scaled by 1.5 in S and T: 

iJExample 15-6 Scaled, Bilerped Textures 

'IfSpPSetTextureFilter (G_TF_BILERP) , 
||DPTexcureRect:angle(50«2, 50«2, 150«2, 150«2, 

W G_TX_RENDERTILE, 
0, 0, 
3«9, 3«9I, 

Smooth scrolling of texture rectangles is discussed in "Smooth Scrolling" on 
page 286. 
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Average m od e f o r 1 : 1 RatioSampling 

There is a special case in which the texture filter can periplocin exact 
average using all four texels. This case occurs when the sample point lies 
exactly in the center, i.e. s_frac = t_frac = 0,5. To enable the average mode use 
the command: 

Example 15-7 Enable Average Filtering;"!!. 

gsDPSetTextureFilterf G_T F_AVERAGE ^m§:0SP 

In order to force the samptepo^tto be in the middle of the texel, set the start 
pomt to 0.5 and then steplBy 1 lSijj|ger pixel. Example 15-8 demonstrates 
this: '***%&■ 

Example 1 5-8 , jgA^eraging Textures 

g s D P S e t T e||ur eF i £^f^;g_TF_AVERAGE } , 
gsDPTextuffeect^|ff§i'i|:<;:-;2, 50<<2, 150<<2, 150«2, 

G_TX_RE%ERTXLE, ? §f| 

1«4, 1«4, M 

1«10, 1<<10) 0<^:y0f 

Copy 

'Copy mode is a special pipeline mode that allows fast image copies to the 
framebuft|i> : Copy mode can be enabled as shown in 

Example 15-i* Enable Copy Mode 
gsDPSetCycleType{G_CYC_COPY) 
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In copy mode, four horizontally adjacent texels: arc copied per clock as 
shown in Figure 15-6. " ? %«^- 

Figure 15-6 Copy Mode „„ ;W . : ^ 

Texture in Tmeffif ""^lE 




r<J 



In copy iriode, since four texels are copied each clock, the step in S per clock 
must be II|||o four. Example 15-10 shows a texture rectangle using copy 
mode. "'*' 

^.Example 1 5-1 Copy Mode Texture Rectangle 

,: q|pPSecCycleType (g_cyc_COPY) , 

g|t)PT ex tureRec tangle (50«2 , 50«2, 150«2, 150«2, 
M? G_TX_RENDERTILE, 
W' 0, 0, 

4«10, 1«10) , 



Since copy mode bypasses most of the RDP pipeline, the filter settings are 
not used. However, it is still necessary to disable perspective correction as 
shown in Example 15-2. Also, copy mode is not valid for all texture types, 
see "Copy" on page 259. 
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It is possible to scale textures in copy mode in the T{Y) direction only. Note 
that in this case, the rules for point sampled scaling apply, only integer 
power of two scalings. 

In copy mode, textures are copiea^frectly;i6 n merriory, so there is no 
opportunity for color combiner o^fanonfi| : iltexija^ transparency, etc. 
Copying is a write-only operation ^;ira|y^arencv|tsing the normal 
blending hardware is impossible. However, you can achieve 'cutout' and 
'dithered' types of transparency using meSpptrag|Bmpare logic, see "Alpha 
Compare Calculation" orifftlee 315. 
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Simple Texture Effects 



This section describes some 'sprite'-tvp e effects that afliiMmonly useful 
for texture rectangles. This is mte ; r|3e;<i : ;tp be a starting point for 
programmers, not a complete list Undof^te.cljy, clever programmers will 
find the hardivare allows man§j|ther eipsi^Sg*. 

Flip 

Flip means to rotate anillage 180 degrees around the X or Y axis or both as 
shoivn in Figure 15-7. :ff 

Figure 15-7 Flipping Texture Rectangles 



original 



flipX 



flipY 



flipXY 



If the texture map to be flipped has a size that is a power of two in the 
direction & the flip, then you can use the mirror_enable ("Mirror Enable S/T" 
on page 222) bit in the tile descriptor to perform the flip. For example, 
suppose we have loaded a 32x32 16-bit RGB A texture into Tmem. To flip the 
texture in X we can use the code in Example 15-11. 

^j||ample 1 5-11 Flip a Texture in X 

ftpDPSecTile(G_IM_FMT_RGBA, G_IM_SIZ_16b, 8, 0, 
¥ G_TX_RENDERTILE , , 

G_TX_MIRROR f 5, G_TX_NOLOD, /* s */ 

G_TX_NOMIRROR, 5, G_TX„NOLOD) , /* t */ 
gsDPTexi:ureRectangle{50«2, 50«2, 150«2, 150«2, 

G_TX_RENDERTILE , 

32<<5, 0, /* scart s on mirror boundary */ 

1«10, 1«10), 
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Note that the S start point is 32. Since the texture will bemirrored when the 
S coordinate is between 32 and 63 if the mirror efialte-Sit in the tile is set, we 
get the effect of a flipped texture. If the mirror bit is disabledgpie texture will 
remain unflipped. .-^ty?^ 

For textures that are not power o||wo sizes, we must use another approach 
for flipping the textures. Supposl;we haf e loaai^a 48x42 16-bit RGB A 
texture rn Tmem and would like to flip the texture m T. The code in 

Example 15-12 would accomplish this. §h, 

Example 1 5-1 2 Flip a Tex^^rin Y (non power-of-two size) 
gsDPTexcureRec tangle (::S: : 0«i| ; -;,; : , ; 50«2 , 98<<2, 92<<2 ,. 
G_TX_RENDERTILE , *& 

0, 41«5, /* start t at bBlj||p of texture */ 
1«10, ( (-1)«10) ScOxffff ) , /'*" step from bottom to top of 

texture*;,*;::;-:;;:.;;-:;,,. 

Note that w|. chang^||t4i?||xture T coordinate to start at the bottom of the 
texture andAangeffie increment in T so that we step from the bottom of the 
texture to the top, thus flipping the texture in Y. 

There is also a variation of the texture rectangle called 

g0MextureRecta77gleFlip() that swaps the S and T coordinates in hardware. 
:If we'Kad a display list as in Example 15-13 

Example 1 5-1 3 TextureRectangleFIip command 
gsDPTextxlreRectangleFlip (5 0<<2 , 50<<2, 98<<2, 92«2, 

G_TX_RSp)ERTILE , 

0, 0, 

1«10, 1«10) 
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we would get an resulting image as shown m^%gt|f£T5- 
Figure15-8 TextureRectangleFlip Command 



Qltapipi 



TextureRectangleHip 



Mirror 

Mirroringll also uSgf^lfor data compression in cases where the texture has 
axial symMetry. For exarnjsle, a tree could be created with half of a tree 
texture that was mirrored in X as shown in Figure 15-9. 

Figure 15-9 Mirrored/pee 




original texture 



texture rectangle using mirroring 



ijps mentioned before, to use hardware mirroring, the texture must be a 
Ppower of two size in the direction to be mirrored. Suppose the tree texture 

above is a 16x40 16-bit RGBA texture. Example 15-14 will render the 

mirrored tree as shown in Figure 15-9. 

Example 1 5-1 4 Mirrored Tree 

gsDPLoadTextureTile(tree, G_IM_FMT_RGBA , G_IM_SIZ_16b, 

16, 40, 

0, 0, 15, 39, 
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0, 

G_TX_MIRROR , G_TX_CLAMP , 
4, G_TX_NOMASK, 
G_TX_NOLOD, G_TX_NOLOD) , 
gsDPTextureRectangle (50«2, 
G_TX_R£NDERTILE, J 

0, 0, 
1«10, 1«10), 



Wrap 4fe t . '''^MW'''' 

Wrapping allows a smalL : t^tS|e : ;to fill a larger rectangle by repeating the 
texture over and over. In the Nihfe^fip-64, wrapping is enabled if the mask 
(see "Mask S,T" on page 223) in the ml||escriptor is non-zero and the clamp 
bit (see "Clamp S,T" on page 224) in the tile descriptor is not set for the 
coordinate in question. The mask determines which power ot two the wrap 
occurs on. .Figure Iff 10 .shows the results for various wrap boundaries 
using a singll textui^'Wriprpmg can be used in copy mode except for 

Figure 15-10 Wrapping on Several Boundaries of the Same Texture 



original fixture 




wrap at 4 



wrap at 8 



wrap at 16 



Wrapping can also be used in conduction with mirroring. Suppose we 
wanted to wrap the mirrored tree shown in Figure 15-9. This could be done 
using the code in Example 15-15. 

Example 1 5-1 5 Wrapped and Mirrored Tree 
gsDPLoadTextureTile{tree, G_IM_FMT_RGBA, G_IM_SIZ_16b, 



282 



NINTENDO 



DRAFT 



TEXTURE RECTAN.QL.JIS (HARDWARE SPRITES) 



16, 40, 
0, 0, 15, 39, 
0, 

G_TX_MIRROR | G_TX_WRA 
4, GJTX JNOMASK, 
G_TX_NOLOD, G_TX_NOLOQ 
gsDPTexcureRec -angle (50<<i 
G_TX_RENDERTILE , 
0, 0, 
l«10 f 1«10), ,.. 



90«2 



Note that the G_TX_WRAP above is really unnecessary because wrapping is 
implicit as we have a non-zero ma§k value and are not clamping. It is 
included just for documentation pu^goses. The resulting image would look 
like Figure 15-11. 

Figure 15-11 : ; prSpped and Mirrored Tree 




texture 



texture rectangle using wrapping and mirroring 



gliding Textures 

Itfls easy to slide a texture relative to the rectangle primitive by the changing 
jfp tile descriptor values of SL and TL (see "SL,TL" on page 224). Using the 
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hie descriptor allows the texture coordinates tdifgstetic^iiy defined. The 
effect of changing SL, TL is shown in Figure 15-l2 : ; u: ' ::;::T: 

Figure 1 5-1 2 Effect of Changing SL, TL^.^ 
-t ,mmM^ 



texiuro rectangle 




texture 



+t 



Sippiie we have a 32x32 4-bit I texture loaded in Tmem. In Example 15-16, 
pvo re3j|ngles are rendered with the texture placed in different positions 
using SL and TL. 

Example 15-1^, Sliding Texture Using SL, TL 

gsDPSetTileSize(G_TX_RENDERTILE, 50, 50, 82, 82), 
gsDPTextureRectangle(50«2, 50«2, 82«2, 82«2, 
111,.. G_TX_RENDERTILE, 
'■•i'Q, 0, 
11<<10, 1«10), 
gs^SetTileSize(G_TX_RENDERTILE, 80, 100, 112, 132), 
||<pPTextureRec tangle (100«2 , 10 0<<2, 132«2, 132«2, 
***' G_TX_RENDERTILE, 
0, 0, 
1«10, 1«10), 

Note that SH and TH are only used when clamping. Because SL and TL are 
unsigned, the texture rectangle coordinates must be offset to allow sliding 
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above the top edge or to the left of the left ec3ge„o£ th^rectairgle. This is 
shown In Figure 15-13 and Example 15-17. j|f 

Figure 15-1 3 Biasing Texture Coordinates for Positive S'cf'TL 
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Bias S coordinate so that 
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+t 



Example 15-17 Biased Coordinates for Positive SL 

gsDPSetTileSize(G_TX_RENDERTILE, 25, 50, 57, 82; 
gsDPTexcureRectangle(50<<2, 50«2, 82<<2, 82«2, 
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G JTX^RENDERTILE , 
50«5, 0, 
1«10, l«10i, 

Smooth Scrolling 



Scrolling involves positioning textile, rectangles on the screen and also 
positioning the texture within the rectangle. Thej ifctangle geometry can be 
positioned with 2 bits of fractional precisj&n^^and Y. The texture 
coordinates can be specinfM^with 5 bits of rrlctfenal precision in S and T. To 
get the smoothest scrolliivp^5||fi|,can use the S and T start point as the 
fractional part and the rectangl¥*s;^_and Y position for the integer part. So 
effectively, you are sliding the textill^|o :|: achieve fractional displacements. 
Example 15-18 shows how such positioning could be achieved. Keep in 
mind that a boreterarea around the texture must be present so that the * 
texture doesrft ,; ClaSr||? when it slides off the rectangle. 

Example 15^1% AcO&rate Portioning Using S and T 

float xpos = 10.375, yJSs = 19.432; 
int xi,. xf, yi, Y;.&iA/:££y 

x,M i?%?k (int) xpos, 
Jif =' : 4;|int) ypos,- 
t&'f = l!§ - 32 * (xpos - xi) ; 

yf = 3l|,- 32 * (ypos - yi) ; 

gDPTextiireRec tangle (glistp++, 

xi<<2'1||f^«2 / (xi+3 2}<<2, (yi+32)«2, 

g_tx_rMI&ertile , 

xf, yf, 

1«10, 1<<10); 

Billboards 

ii^prboards are textures that define complex outlines by using texture 
Transparency. For example, rather than creating a tree using polygons, you 
can use an image of a tree, with the portion of the image outside the tree 
having an alpha of (transparent) and the interior of the tree having an 
alpha of 1 (opaque). This is shown graphically in Figure 15-14. This 
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technique allows complex scenes to be built by%^Mp6siting[|lmple images 
together. 

Figu re 1 5-1 4 Texture Billboard ^ 



Alpha 
transparent 




\ 



Alpha l^opaque 



original texture 



textffliifectarigie using wrapping and mirroring 



It is import||iil^fensider the antialiasing of the edges created by the 
texture's gjpia pafe^if ortfy 1 bit of alpha is used, then the pixel is either 
written oijjpt. If moire- ii|s,pf alpha are used to create a smoother transition 
from opaqti^to trMspareifjihe edges will be blended with the background. 
Billboards should be rendered after all opaque background objects have 
been rendered. TTiei&sagiJieveral texel formats that allow multiple bits of 
alpha (see "Color Index Frame Buffer" on page 298) and ways of combining 
;|pffi||ent types (see "Combining Types" on page 290). To render this type of 
^antias|ased texture billboard, you must be in one or two cycle mode and you 
shoulduse the render mode G_RM_AA_TEX_EDGE. See "Texture Edge 
Mode, tSjX_EDGE" on page 332 for further details. 

Texture billPbards can also be rendered in a write-only fashion but this also 
implies no antialiasing of the texture edge. This mode is called 'alpha 
gcompare' and basically thresholds the texel alpha with a register alpha value 
: ?|| : ;.a random alpha source to generate a write enable for the pixel. See 
"j||pha Compare Calculation" on page 315 for more details. 

flloud (CLD) Render Mode 

Cloud render mode is intended for rendering texture billboards that are not 
opaque, i.e. smoke clouds, explosions, etc. These are special cases because 
care must be taken not to disturb the antialiased edges of things behind the 
transparent cloud, because these edges will be seen through the cloud. 
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Intensity (I) Textures 

Intensity textures are useful became' they gce : ;tpii;e compact and should be 
used in cases where a large numbfjpof colors 1 ''is n<f|ri ecessary. For example, 
a 4-bit I texture can be as large as Il8x64' texels. formally, the user would 
like the primitive to have some specific ||j|pr, an <g|ne I texture should 
modulate that color. Forjxample, to creati:|:i#^ you could use two I 
textures, one for the browif|f$nk and one for the green tree top. You can use 
one of the many register C(f0fl|%tbe color combiner to define the primitive 
color. In Example 15-19 we use primitive color to define the colors of the 
trunk and treetop. 

Example 1 5-1$i|3r#nsity Texture Modulating Pnrrutive Color 

gsDPSetCoJjtineMoJ|£§^^ODULATEI_PRIM, G_CC_MODULATEI_PRIM) , 
gsDPSetPr 1|go 1 orjf ,* '" /I jf) 5, 51, 51, 255), / * brown * / 
gsDPLoadTefi l ureTile_4b(jfunk, G_IM_FMT_I, 16, 40, 
0, 0, 15, 39, 

S' 41||||r : - : 

Jl| i _TX_MIRROR, G_TX_CLAMP , 
. JP' 4& G _TX_NOMASK , 

IP'" gJ|x_nolod, g_tx_nolod) , 

^sDPTelfcureRectangle(50<<2, 100 « 2, 82<<2, 140 « 2, 

GJTX^RENDERTILE , 

o, 0;%:.. 

1«10,^P<10) , 
gsDPSetPrimColor<0, 0, 0, 139, 0, 255), /* green */ 
gsDPLoadTextureTile_4b( treetop, G_2M_FMT_I , 3 2 , 32, 
III 0, 0, 15, 39, 

ffe_TX_MIRROR / G_TX__CLAMP , 
Ms, G„TX_NOMASK, 

G_TX_NOLOD, G__TX_NOLOD) , 
^gsDPTextureRectangle(44«2, 68«2, 108«2, 100«2, 
G_TX_RENDERTILE , 
0, 0, 
1«10, 1«10), 



By interpolating between two different colors using the intensity as the 
parameter, it is possible to achieve two-color textures. The combine mode 
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G_CC_BLENDPEDECALA interpolates bet^#en;|p^itive'; color and 
environment color using an I texture. For this comBine mcjele, when the 
texture is the pixel will be environment color, when in^exel is all ones, the 
pixel will be primitive color. ExampleT5-20 assumes an I texture has already 
been loaded into Tmem. -|| ^ s ._ 

Example 15-20 Two-Color lexlSfe^^ 

gsDPSetCornbineMode{G_CC_BLEirDP||gCALA t 4iG_CC„BLENDPEDECALA) , 
gsDPSecPrimColor(0 ,, % , 2 05, 51 ^^§^^05 ) , / * brown * / 
gsDPSetEnvColor(0/' : l||||i), 200, 0, 255), /* green */ 
gsDPTextureReccangl;|llir<:5:2, 100«2, 82<<2, 140«2, 

gjtx_rendertile|F °^8jf-... 

0, 0, 

1«10, 1«10), ^ &r 

Since for mtehsi% textures the texel value is also copied onto the alpha 
channel,;¥|)u can |£.r^eve : transparency using an intensity texture. For 
example/ if you <i|ine a^l^it texture of some text to have an intensity of Oxf 
for the chatterers and a v||lte of elsewhere, and then render using the 
combine mode G_CG^S|lpNDPEDECALA and the render mode 
QaRM_TEX_EDGE, : 11$ text will have the primitive color and be transparent 
f-pspwhere. Note that if the edges of the text are filtered to give smooth edges, 
)• theifkthe text will have an intensity ramp at the edges. If you use an 
antilttased render mode, such as G_RM_AA_TEX_EDGE, then the text will 
look sinoother than if a 1-bit alpha texture like 4-bit IA or 16-bit RGB A were 
used. ^1 

Intensity Alpha (IA) Textures 

jfpriis texture type defines an intensity (I) channel and a separate alpha 
l$|annel (A), This type is convenient where the transparency of the texture 
jljist be defined separately from the intensity. The sizes include 4-bit (3 bits 
P I and 1 bit of A), 8-bit (4 bits of I and 4 bits of A), 16-bit (8 bits of I and 8 
bits of A). Keep in mind when using 1-bit alphas that the pixel will be either 
written or not, depending on the alpha bit. Therefore, the transparency 
channel is not antialiased (the texture filter cannot 'create data' to smooth the 
edge). Scaling a 1-bit alpha texture can result in blocky-looking outlines. 
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Col or ( R G B A) Textu res 

There are two sizes of RGBA textures: 16-bit (5 bits R, 5 bits G; ! 5 bits B, 1 bit 
A), and 32-bit (8 bits R, 8 bits G, 8 bi^I||bits A). While 16-bit RGBA 
textures are popular because theyjp easy|o. create and model with, they 
have the disadvantage of only a 1-fe alp]aj|#iann^| : This can be overcome 
in certain cases, as discussed in "Combining Types" on page 290. 

Colorlndex(CI)Textures ; ^itl : ■ . 

Color index textures come in two sifies, 8-bit and 4-bit. When using color 
index textures only half the Tmem is used for textures (2KBytes). The other 
half is used to store the lookup table (TOTT) that converts the index texel 
into either 16-bit RGBA or 16-bit IA types. It is also possible to copy 8-bit CI 
textures dkecfh|toih;8-bit framebuffer as discussed m "Color Index Frame 
Buffer" on pj|e' 298^| c ,, M|l ^^ 

4-bit CI textufll' must select fl|fe of 16 possible palettes. Each palette has 16 
entries. The g*DPLoadTLUTJ§kll6 can be used to load an individual palette. 
The .palette to use is de : tme#in the tile descriptor (normally you would 
deipilthe palette in the g*DPLoadTexture* command), so different tiles can 
sipct different palettes. 

•%ou can u|e a 4-bit CI texture to provide more alpha bits than is possible with 
the 4-bit lSjj$pe, because the TLUT can hold 16-bit IA values. Therefore, 
you could lobte|Up 16 levels of alpha with a 4-bit CI sprite as compared to 1 
level for a 4-bif IA sprite. 

%6mbining Types 

As Jffentioned previously, 16-bit RGBA textures have only a 1-bit alpha 
cJaJpnel. If you want to have a smoothly antialiased texture edge using the 
liSfet RGBA type, you must combine two types of texture. Example 15-21 
shows how a separate alpha texture with a 4-bit I type is combined with a 
16-bit RGBA type to get smoother edges on a sprite. 

Example 1 5-21 Interpolate Between Two Tiles 

#define MULTIBIT_ALPHA 0, 0, 0, TEXELQ, 0, 0, 0, TEXEL1 

gsDPSetCyleType{G_CYC_2 CYCLE) , 

gsDPSeCTextureLOD(G_TL_TILE) , 
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gsDPSetCombineMode(MULTIBIT_ALPHA, 6'^£^Md&2) , ||| 
gsDPSetRenderMode (G_RM_AA_TEX_EDGE, G^R^l^JTExJ|jDGE2 
/* load color part of texture */ S0Sl r ' 

gsDPLoadMultiTile (color, > :VVl 

0, /* Tmem address in ®ri ; ^i' : fe;%ords */ 

G__TX_RENDERTILE , /* ti§e * / 'i:AMpm,, 

G_IM__FMT_RGBA , G„IM_Sl jy. 6b # ||p" 

32, 32, ' A Wx0W : 

0, 0, 31, 31, Jf 

0, . & _ ^WMffW 

G_TX_NOM I RROR , <fa^_NOMIRROR , ' A:M "' S '' 

G_TX_NOMASK , G J?|lt$MASK , 

G_TX_NOL0D ., G_.TX£;NOLBi|;:i ; : . 
/* load alpha part of textUr^;.* / 
g s DPL o adMu 1 1 i T i 1 e_4b {alpha, '"w* 

256, /* Tmem address in 64 -bit words */' 

G_TX_R : Mip ; ILE+l, /* tile */ 

G_IM i iu|llT_1 7|£, o . ,•«,.._ 

3 2 , || . Ml 

0, olj||i, If, 
o, 

G_TX_NOMIRR0R, ; G_TX_NOMIRROR , 
/Ky. G_TX_NOMASK, ' Gj¥x_NOMASK , 
!p-™|;G_TX_NOLOD , G_TX_NOLOD ) , 
? gsIJ#TextureRectangle (glistp + + , 

S0<<2, 50<<2, 82«2, 82«2, 

G3fa_RENDERTILE , 

0/%.. 

i«fl|ia«:o) ; 



||he idea here is that in two-cycle mode we get two texel values, one from 
lie 16-bit RGBA texture and one from the 4-bit I texture. In the color 
jpmbiner, we program the alpha combiner to use the 4-bit I texture (the 1-bit 
m. of the RGBA texture is not used). In the color combiner, we select the RGB 
texture as the color source. Since we are using both cycles for this trick, it is 
not possible to use mipmapping or other two-cycle modes simultaneously. 
Note that you could have used an 8-bit I texture for the alpha channel if you 
needed more alpha resolution. 
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Multi-Tile Effects 



There are eight tile descriptors available in the tile memor^ of the RDP. 
These hie descriptors contain mforx^aioh, about the type and size of tiles 
and where these tiles are located jj^mer&i- In ..two-cycle mode, texture from 
two tiles is available for each pix|| MangeifeCtflire possible by 
manipulation of tile descriptors an;cfcc^rfpiiung of|he textured pixels. 

In the g*DPLoadTexture* commands, a sim|j|f ,;;t^O-tile system is used for 
loading and rendering. In this system, the'GlTX_LOADTILE is used for 
loading a tile starting at T^e|f^|ddress and the hie descriptor 
G_TX_RENDERTILE is sit upf cindering the tile. This is a 
double-buffering scheme which avcliljhaving to insert tile sync commands 
in the load macro. Notice that since each tile is loaded at Tmem address 
and the G_TX_RENDERTILE is always used for rendering, we cannot use 
these macro/lbr Toa%ng multiple tiles into Tmem. 

In order toil|ow tRlf user t(§r|ianage Tmem for multi-tile effects, the load 
macros g*DPLoadMuUiTile agcl g*DPLoadMulHBlock were created. These 
macros allow the user to specify the Tmem address of the tile and the tile 
descriptor number to use when rendering this tile. 

Simple Morph 

One simple : use of two tiles is to linearly interpolate, using a parameter to 
indicate thifttlend amount, between the tiles. A register value in the color 
combiner, sulfi as primitive alpha, can be used as the 'slider' to blend 
between the two textures as shown in Example 15-22. Notice that we define 

mjr own color combine mode to achieve this effect, since gbi.h didn't have 

"flfemode we needed. 

Example 1 5-22 Interpolate Between Two Tiles 

fldefine MY_MORPH TEXELl , TEXELO , PRIMITIVE_ALPHA, TEXELO , \ 
TEXELl, TEXELO, PRIMITIVE, TEXELO 

gsDPSetCyleType(G_CYC_2CYCLE) , 
gsDPSetTextureLOD(G_TL_TILE) , 

gsDPSetPrimColor(0, 0, 0, 0, 0, 128), /* 0.5 blend */ 
gsDPSetCombineMode (MY_MORPH, G_CC_PASS2 } , 

gsDPLoadMultiTile(faceO, 

0, /* Tmem address in 64-bit words */ 
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G_TX_RENDERTILE, / * tile */ °' ! '^&;<?f#'-' 

G_IM_FMT_RGBA, G_IM_SIZ_16b , '''"«;»- 

32, 32, 
0, 0, 31, 31, .«$#&* 

o , v#'"' : '^l!l 

G_TX_NOMI RROR , G JTX_NO||XRROR Jfe;«$fc: , 

G_TX_NOMASK , G_TX_NOMAik, '''''''%&: 

G_TX_NOLOD , GJTX^OLOD? ve| ; ;i 
g s D PL o adMu ltiTile{facel ; . . J||f 

256, /* Tmem address in 64~tii:^mpt0s */ 

G_TX_RENDERTILEl||p| fe /* tile */ 

G_IM_FMT_RGBA , g||1|£I Z_l 6b , 

32, 32, ||r" '*!&,. 

0, 0, 31, 31, ^"fcv 

, ""•■>■' 

G__TX_NOMIRROR . G_TX_NOMIRROR , 

G_TX_NCM\?:-'. . G_TX_NOMASK , 

G_TX JjbLOD , f (3 JEX^NOLOD ) , 
gsDPTex|S|:eReq§aKgle;|Jgjlistp++, 

50«iiSko<<2, 82<< : || 82«2, 

G_TX_RENDERTILE, J|f 

o , o , jmm&S? 

J : fe i«io, i<<ioff s * 

By making the primitive alpha an animation variable, a simple 'morph' 
effed||an be achieved. 

Smoothing Flip-Book Animations 

Often sprite animations are a sequence of key frames which are selected at 
,the appropriate time by some animation variable. The linear interpolation 
|b;etween two images as described in "Simple Morph" above can be used to 
|jpoothly transition between two key frames. Imagine a series of n images 
,||t an animation selected using an animation variable frame. The integer part 
jfof frame is called frame J and the fractional part is called frame J. An 

algorithm for smoothing the sequence is described in Example 15-23. 

Example 1 5-23 Smoothing an Animation Sequence 

Load tiles frame_i and frame_i+l into Tmem 

Set primitive alpha = 256 * frame_f 

Render Che rectangle using MY_MORPH combiner mode 
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The frames do not necessarily have to be relatetffe time. For example, you 
could interpolate between different flame images that are randomly 
selected to create a fire effect. ■ii-'m ;:y 

Shrinking Sprites 

In the previous discussion of scaling in "Bilinear Filtering" on page 273 we 
only discussed scaling a sprite to a larger | ize smcjpcaling it smaller would 
result in aliasing effects, ft is possible to effectively shrink an image by 
interpolating between tw<§ffe, one of which is a half the size of the other 
tile. This is shown in Figu#ilj#5. PnmJod_frac is a register in the color 
combiner that can be usedlto indicate the fractional distance between the 
two 'levels-of-detail' of the sprite, ^ofehat there is no special reason we 
used this register as the interpolation parameter, other than it's name 
suggests this u$fij^., 

Fi gu re 1 5-1 5||umkii^&J^nte 




Tilel 



One of the tile descriptor parameters is the shift (see "Shift S,T" on page 223) 
l|hat describes how many places to bitwise shift the tile coordinates for the 

primitive. This implies that one tile's size is related to the other's by some 

integer shift, but the tiles don't necessarily have to be power of two sizes. 

EiJIriple 15-24 shows the code to create a sprite that is 0.75 the size of the 
g|§§er image. The user must scale the size of the rectangle primitive by the 
"desired amount as well. 

Example 15-24 Shrinking a Sprite 

#define MY_LOD TEXEL1 , TEXELO, PRIM_LOD„FRAC, TEXELO, \ 
TEXELl, TEXELO, PRIM_LOD_FRAC , TEXELO 

gsDPSetCyleType(G_CYC_2 CYCLE) , 

gsDPSetTextureLOD(G_TL_TILE) , 
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gsDPSetPrimColorfO, 12 8, 0, 0, 0, O)'^W§§0% lodgfrac */ 
gsDPSecCombineMode<MY_LOD, G_CC_PASS2), , fe lp 0' 

gsDPLoadMultiTile ( f aceO , 

0, /* Tmem address in 6J|f'$:|;:^ : .: words */ 

G_TX_RENDERTILE , / * t i|e ,:: ' * / : ^;| 

G_IM_FMT_RGBA , G_IM_S(|_1 6b ,&0M^^ 

32 , 32, ^mis-^W ~''m. 

0, 0, 31, 31, : -'W'"9| 

G_TX_NOMIRROR, ^TK-NOMIRRO^^P^' 
G_TX_NOMA SK , G jf|||Mp-MASK , 
G_TX_NOLOD, GJIp|NG|;|J3 ; ) , 
gsDPLoadMultiTile f#acel^i|||| ;fe 

256, /* Tmem address iri^ij§||bit words */ 
G_TX_RENDERTILE+1, /* tile™*/ 
G„IM_FM^RGBA , G_IM_SIZ_1 6b , 

16, 0M c/; '% 

o, |fi is, j|$$iii&, 

G_Tx2f &MI RROR , G J?X_NOMIRROR , 
G_TX_NOMA SK , G^XhIjOMASK , 

g_tx_nolod,^g_^x£nolod) , 

|i®feppTGXtureRec tangle (glistp++, 
f %50«2, 50«2, 82<<2, 82«2, 

i fb_TX_RENDERTILE , 
%<5, 8<<5, 
llllO, 1«10); 



Texture Decals 



f|fe can use the alpha of one tile to select between the texel color of two 
different tiles to create a texture decal. Figure 15-16 shows an example of a 

%ag created using textures decals. The insignia of the flag has transparency 
around it's edges. After mirroring and wrapping once, the texture is 
clamped. In the color combiner, the texture alpha is used to interpolate 
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between the flag stripes and the insignia. Where^ :: .algM| ; Is zeie> the stripes 
will show, where the alpha is one, the insignia wilf s sM§w. 

Figure 15-1 6 Texture Deeals 




J tile 




tile 1 



alpha- 1.0 



Need example code... 

Interferenc^ffects^p#iife 

Multiplying two textures togifher, especially while sliding the textures 
relative to each other can cfpte interference patterns. For example, a 
horizontal stripe pattern4ftuitiplied by a vertical stripe pattern creates a set 
oJMright spots at the intersection of the points. If the stripes are slid relative 
Jt& : each|Other, the points will move also. Multiplying can also be used to 
Jmodula1|k)ne image with another. For example, Figure 15-17 shows a 
complex wave resulting from the modulation of two simple waves. 

Figure 1 5-1 7'Modulation 



texture 
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Tiling Large Images 



Sometimes it is desirable to render large textures, LeMellttires to large to fit 
entirely into Tmem. This can be accomplished via 'tiling' or breaking the 
large image up mto smaller rectangular tiles .that do fit into Tmem. These 
tiles are rendered onto primitives mat:#rm%|rhesh coincident with the 
texture tiling. The textured recta^gle^ilrrmitivep a useful primitive for tiling 
a background image in a sprite game^for instance. If you point sample the 
texture tile, it is only necessary to loacl^einiirnber of texels you wish to 
display. However, if you want to bilinear ly filter the texture, you must load 
a border region of one texel around the tile so that the interpolation works 
correctly at the edges of the hie. See "Bilinear Filtering and Pomt Sampling" 
on page 236 for more information. 
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Color Index Frame Buffer 



You might have noticed that one of the color image types that is available is 
the 8-bit I type. You can use this mode to render color index images into the 
framebuffer. Before displaying the 8-bit image, however, you must read the 
8-bit image mto Tmem and dereference into a 16-bit RGB A image. Note that 
the 8-bit frame buffer can share meWmf gtemory as the 16-bit frame buffer 
by placmg the 8-bit buffer in the high halljif the 16||jt buffer.This technique 
can give better performance than rendermgjd^eSfy to a 16-bit framebuffer 
because the memory accesses are more efficient. Also, the initial clear of the 
framebuffer is faster because the buffer is half the size. 

There are, however, restrictions whe'ft?|lMng this technique. Since we are 
rendering an 8-bit CI image, you must texture map objects with 8-bit CI 
textures (but dc^'|; : dereference yet) and use shade colors that fit into your 
palette. You c#moffiMer the textures since the texture values in the pipeline 
are indices. :f|ou dsoJI^ttMend with memory colors (unless your palette 
is laid out splCihcallf to aHo^this), although you can achieve cut-out type 
transparency Antialiasing is||tso not available for this framebuffer type, 
because no coverage is storeC 

These restrictions sound severe, but may be practical for some sprite games, 
5 ^iipecia|^ those that use sort priority and can render totally in copy mode. 
gin copy mode (and 1 or 2-cycle mode) you can get cut-out transparency by 
using the alpha compare logic and reserving an index (0 is a good choice) 
that indicallSitransparency. If the index means transparent, then setting 
the blend aiphato 1 and enabling alpha compare (G_AC_THRESHOLD) 
would allow all pixel with any index greater than or equal to 1 to be written 
tig the framebuffer but pixels with index would not be written. 
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Z-Buffering Texture Rectangles 



Normally, sprites are rendered in a Z sorted list and rendered from back to 
front. The Z of each sprite must |em ; iittained by the application and the 
application must do the sort eaeif frame ;j^nc>ther technique is to use the 
z-buffer to determine priority. i|||k 

Primitive Z 

The texture rectangle haspftZ. value associated with it directly, however you 
can use the primitive Z rlplir ig*DPSetPrimDepih()). To force the z-buffer 
logic to use primitive Z rather tfejgdxel Z, you must use the following 
command: " l W 

g s DP S e tDepJg||||&r ce ( GJZ S_PRIM ) 

You mustjjfso use a RendgrMode that enables z-buffering, such as 
G J^M_Zf|igPAjlJRK ife-buffer sprites, you would have to insert a 
g*DPSetPrimDepth() command before the rectangle command of each 
sprite. Because the primitive Z is explicitly buffered in the pipeline, it is not 
nifessary to insert ppf sync commands before setting the register. 

. Note that z-buffering can only be used in 1 and 2-cycle mode. In copy and 
fill mode, you should use the RenderMode G_RM_NOOP to effectively 
disablel-buffering and put the pipeline logic in a safe state. 
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Chapter 16 

Antialiasing and Blenjjffg 



Aliasing is a signal-processing term describing sampling errors that occur 
when a continuous function containing sharp changes in intensity is 
approximated using discrete intensity values. Antialiasing is a method for 
minimizing these Jf^fs^y using gradations in intensity of neighboring 
pixels at eltges o£prirmtil||, rather than setting pixels to maximum or zero 
intensity orly There are ifilny references on antialiasing as it applies to 
graphics. This chapter will discuss the method of antialiasing used by the 
Reality Co-Processor'^'CP). In addition, we will discuss other uses of the 
$MUer hardware. The blender plays a key role in antialiasing, z-buffering, 
i p Sana||ansparency effects. After understanding the blender hardware, it may 
be possible for a user to come up with new effects by clever programming of 
the blender pipeline. 
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Antialiasing 



Antialiasing is an algorithm that attempts to minimize sa^^Stg errors that 
occur when an edge of a primitive is.elJSpiayed on a raster image. Visually, 
these errors cause the edge to be sta0-case^:$x look jaggy. For scenes with 
moderate complexity and/ or anufjltion, th&Selfiggies are the source of 
high-frequency noise, which is ann<Of an : g;'|hd districting to users. 

Figure16-1 Edge With and Wi thou t Antiali agin g 



Edge 



Primitive 











~ m ~" | | 










HSiH 










HSU 


J_ \jmmmnm 
wmmwmmmm 



phased Edge 



















































b« 








.. 


wM 


■■■ 






m 


SB^HIHHHHHHI 






■"■'J 


m 


■ 


■ 


■ 


■ 


■ 


■ 


■ 



1| Antialiased Edge 

llfFigure 16-2, "Unweighted Area Sampling/' on page 303, antialiasing is 
achieved by weighting the intensity of the pixel in proportion to the area of 
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the pixel covered by the edge. In signal-proc 
unweighted area sampling. 

Figure 16-2 Unweighted Area Sampling 
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9/16* Black + 7/16 White 



Edge 



Primitive Color 



High-erii;|raphis macffiies typically use an antialiasing technique known 
as super-sampling, in whicfthe pixel is divided into a grid of sub-pixels. A 
color is computed for ..each subpixel and the subpixels that are covered by a 
primitive are averaged to produce the final pixel color. In the case where 
! more than one primitive covers a pixel, each primitive's color is weighted by 
: thShumber of subpixels it covers. Also, depth (Z) can be found for each 
subpixel which allows antialiased interpenetrations between primitives. 
WhilS|uper-sampling is straightforward and effective, it is also expensive in 
terms clj|iemory and memory bandwidth. For a 4x4 subpixel grid, 16 color 
and Z values must be stored for each pixel. In addition, to achieve required 
fill rates, each of these values must be accessed every clock. 

^Because the Nintendo 64 machine has very severe cost and memory 
Requirements, a new and novel technique for antialiasing that avoided (as 
Ijnuch as possible) the storage requirements of super-sampling but yet 
(provided satisfactory antialiasing was needed. This method relies heavily 
on the notion that different objects have different antialiasing needs, and that 
the hardware can be simplified by requiring that different RenderModes are 
configured as appropriate for a particular object. As well, there are 
display-order restrictions for rendering certain types of objects. For 
example, transparent objects must be rendered after all the opaque objects. 
Finally, it was recognized that antialiasing of silhouettes could be done as a 
post process during video output. A data flow diagram of the analogizing 
algorithm is shown in Figure 16-3, "Antialiasing Data Flow," on page 304. 
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Note that this method requires, in addition to m : elr|r|ef iolor arid Z value, 
three bits of coverage and four bits of deltaZ per pixel, quite, s^ill when 
compared with super-sampling methods. 

Figure 16-3 Antialiasing Data Flow : l ' : '' ''" ; 't|| 
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The antialiasing data flow shows the most general casllliir z-buffered and 
antialiased primitives. Other technie|nes are possible. For example, if the 
database is sorted and renderect|n bacltto front order, non-z-buffered 
antialiasing can be used. All offhe vari^t ; fp}5es of antialiasing are 
discussed in detail m "Blender Modes and Assumptions" on page 327. 

For each pixel, a subpixel mask is computed. Tins mask is a 4x4 grid of bits 
where the bit is one if tr|l|subpixel is covered by the primitive and zero if the 
subpixel is not covered ,»|$f^=rriask is converted to a coverage value by 
adding all the bits of thp^asli|j|ether. Since we only have three bits of 
coverage, the sixteen subpixels rnuSf be dithered to eight. The coverage 
value is optionally combined with the pixel's alpha value. This is useful for 
antialiasing.^ (|g^|, created by a texture cut-out. In the blender, the pixel color 
and the last vaIu§ : :|tored for the pixel in memory are combined. The blender 
also combines the^ixel^pverage and memory coverage and does 
z-bufferiftf ^:;. : The ! iiendefl^pically performs operations such as antialiasing 
the intenofedges of objecf&and transparency. The new pixel's color, 
coverage, and Z are,stprjejp]h the frame buffer. The Video Interface (VI) reads 
tfte pixel color and coverage and antialiases the silhouettes of objects. 

We%ill now discuss each hardware unit in the antialiasing datapath in 
isolaibn, before considering how these units work together to render a 
complete, image. 
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Coverage Unit 



The coverage calculation, as described previously, prodiiees% J 4-bit number 
for each pixel that indicates how mu^oKthe pixel was covered by a 
primitive. For example, a value oil' (1.0) radicates the pixel was fully 
covered. A value of 1 (0.125) indites orilpbl^e^fepixel was covered. An 
example of the coverage calculation is shown in Figure 16-4, "Coverage 
Calculation/' on page 306 |# 

Figure 16-4 Coverage Calcination 



2x2 Pixels 




Coverage Dither Mask 



0xa5a5 



coverage = sum(0x8cce & 0xa5a5) = 4 
coverage = sum(0xffff & 0xa5a5) = 8 



« 




X 






8 




tf 


& 




& 






$ 




♦ 



coverage = sum(0x037f & 0xa5a5) = 4 
coverage - sum(0xffff & 0xa5a5) = 8 



Note that it is very important that primitives sharing an edge have 
complementary subpixel masks, otherwise cracks may appear between 
bilges. In the RCP, if primitives use the same vertices to create the primitive, 
the;h;the pixel mask will be complementary There are, however, cases where 
baqjjfgiodelling can lead to cracks, as in Figure 16-5, "Complementary 
Edjpfe," on page 307. These cases can occur when (incorrectly) fractalizing 
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terrain or (incorrectly) generating triangles rf^^^MBs surfaces, for 
example. 

Figure 16-5 Complementary Edges .„. S;:s ...... 




Edges that Share vertic^ Mill: jean correctly 



Edges that do not share vertices are not guaran- 
teed to join correctly 
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2 Stepper 



The Z-stepper calculates an 18-bit fixed point depth valu6 i: (Z) for each pixel 
of a primitive. The value is of Z is. ^or%i|}|y zero at the near plane and 
maximum at the far plane, assuming a pro^^^SPViewportO command. By 
manipulating the g*SPViewport() % |>rnmanpit 'ii : |jd|sible to split the z-buffer 
into separate Z-planes, see Figure lJ|%pi-Buffer T|j|nes," on page 308. 

Figure 16-6 Z- Buffer Planes 
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-► z 



Near,Z=0 



Far,Z=MAXZ 



static Vp yp = { 

SCRiEJsi|ife.2, SCREEN_HT*2, G_MAXZ/2, 0. /* scale */ 
SCKIEN_WB*2, SCREEN_HT*2, GJAAX2J2, 0. /* translate */ 

y> ft. ^i0 m§ k, 

...gsSPfewport(&vp), w; 




objO 




obj i 



-► z 



NearO, Z=0 



FarO/Nearl, Z=MAXZ/2 



Fail, Z^MAXZ 



jjjatic Vp vpO = { 

'#icREEN_WD*2, SCREEN_HT*2, G_MAXZ/4, 0, /* scale */ 

: §£REEN_WD*2, SCREEN_HT*2, G_MAXZ/4, 0, /* translate */ 

static Vp vpi = { 
SCREEN_WD*2, SCREEN_HT*2, G.MAXZ/4, 0, /* scale */ 
SCREEN_WD*2, SCREEN_HT*2, G_MAXZ/2, 0, /* translate */ 

}; 

...gsSPViewport(&vpl), /* render object in second Z-plane */ 
...gsSPViewport(&vpO), /* render object in first Z-plane */ 



No attempt will be made to justify why one would do this, only that it is 
possible. Also, note that the g*SPPerspNormalize() command can be used to 
maximize Z precision. See Figure 12-2, "Perspective Normalization 
Calculation/' on page 146 for more details about g*SPPerspNormalize(). 
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There is also a source of constant Z (from a rfegg|er| : ;tnat cait'set using the 
g*DPSetPrimDepth() command. To select the constant degjjjfj use the 
g*DPSetDepthSource() command. This may be useful wheff z-buffering 

sprites, for example. 



The Z value is sub pixel corrected so that it is a|#ays calculated on the 
primitive. To see why this is necessary consider Figure 16-7, "Subpixel 
Correction of Z," on pHjj|^()9: 

Fi gu re 1 6-7 Subpixel CarrectionjQf Z 



View Frustum 




Primitive 



Projected "View 




Center of the pixel, Z negative (projects behind VP) 
Horizon line, Z = infinity 

"•^ Primitive 



ijn this case, if you calculate Z at the center of the pixel, the Z value will be 

fiegative because Z will be projected behind the viewpoint. A better solution 
|| to calculate the Z value at the subpixel, below the center of the pixel in this 

llase, which intersects the primitive. 
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Color Blend Hardware 

The blend mux selects input ope||nds for|Pe^bie|fder hardware. The 
controls for these muxes are in thB^M^JSetOther^odes modeword. There 
are two sets of mux controls, one for eacfrpf the two possible rendering 
cycles. , h _ 

The blend equation is of the form: 

Equation 1 Blend Equation 

, (axv + bxm) 
color = -i 1- — ; '- 



The reasoning behind this equation will become evident in the discussion of 
thejtntialiasing algordte^fSiscussed later in this document. 

,|fpe f^ur input operands (p, a, m, b) each have four possible sources so two 
[fits are;. needed to control each mux. This gives a total of 8 bits per cycle of 
blend mux control. Since the pipeline can operate in one or two cycle mode 
( see g*DP$gtCycIeType()) the blender must select which of the sets of mux 
controls to '"^.depending on the cycle type (G_CYC_1CYCLE or 
G_CYC_2CYc!LE) and an internal cycle counter. The sources for the p and 
m muxes are identical and are shown in Table 16-1, "P and M Mux Inputs/' 
fH^L page 310. 

Table 1 6-1 P and M Mux Inputs 



Mux Select 



Source 



first cycle - pixel RGB, second cycle ■ 
blended RGB from first cycle 

memory RGB 

blend (register) RGB 

fog (register) RGB 
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For select 0, the cycle select is built into the hak|^|re£".fhe trended RGB' 
refers to the numerator result of the blend equation^ Equahof|:l, on the first 
cycle (it's fed back as an input). Note that this will only#ork J if the b mux is 
set to 1.0 - a, since only the numerator of the blend equation is provided to 
the input mux. Register RGBs r§jjf to colors which can be set using the 
g*DPSetFogColor() and g*DPSetM$ndCol0 : lo%|iands. Colors set using 
these commands are stored in registers Within trl;RDR Care must be taken 
to make sure that a g*DPPipeSyncd^^oaaM i^fsued previous to setting 
these registers. The g*DPPipe Sync () coirunand iillerts a delay into the RDP 
pipe so that a previous primitive is guarahteecl to be finished processing 
before the register is updated. It is anticipated that the user will set a group 
of attributes, process many prifr%iyes, set a new group of attributes, etc. The 
syncs are exposed to the user who^%.rnore likely determine the niinimum 
number of syncs needed than would be possible in hardware. (Note that 
primitive colmt^DPSetPrimColorO, primitive depth, g*DPSetPrimDepth(), 
and sdsso0VpWScissor() f are attributes that do not require any syncs. 

The sourclffor thla muxefcare shown in Table 16-2, "A Mux Inputs," on 
page 311. 

Tjt|e16-2 A Max Inputs" 



Mux Select 


Source 





color combiner output alpha 


%u. 1 


fog (register) alpha 


2 


(stepped) shade alpha 


3 


0.0 



Tj|e sources for the b muxes are shown in Table 16-3, "B Mux Inputs," on 
|||e 311. 

Table 16-3 B Mux Inputs 



Mux Select Source 



1.0 - 'a mux' output 

1 memory alpha 
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Table 16-3 B Mux Inputs 



Mux Seiect 


Source 


2 


pr -'c:, L0 


3 II 


Jp^tfl^ o.o 



In general, the RDP pipeline operates on RGBA pixels with 8 bits per 
component. The LOinTabJg 16-3, "B Mux'Tr!§Ju|Sr ' on page 311 assumes the 
alpha is -a number between? ilrCKLO. These numbers are actually fixed point 
and the output of the a anqf t> "alpha muxes have less resolution (5 bits) than 
the color components (8 bits) to fe^&ee. hardware cost. When this alpha is 
changing slowly across a face, Mach banding can occur due to the reduced 
number of discrete steps in the alpha channel. 

Two dither egmmahcls can be used to reduce Mach banding effects: 
g*DPSetColofyther()^fiM^0BetAlphaDither(). These commands basically 
add a small alftount of randoifihess (1 /2 of an LSB ) to the color and /or alpha 
which makes the Mach bancjpg less noticable. The g*DPSetColorDitlier() 
command also contrQjs^th^Sitherrng of RGB from 8 to 5 bits per component 
(forlitie in 5/5/5/1 pixel mode). 

Ipiere atfe two variations of dithering that can be set usmg the 
':g*DPSetColorDither() command. One is a screen coordinate based dither 
(G_CDJvl|jgCSQ or G_CD_BAYER) in which the dither matrix changes 
based on me::|ocation of the pixel on the screen. In other words, the dither 
partem is regfffered to the screen. The noise dither (G„CD_NOISE), on the 
other hand, adds pseudo-random noise with a very long period into the 
||i§Bs of each pixel. In this mode, the dithering is not registered to the screen 
arlirwill vary from frame to frame. Of course, you can disable color 
differing altogether using the G_CD_DISABLE parameter 

;.A|pha dithering (g*DPSetAlphaDither()) for screen-based dither patterns 
"uses the same matrix that is selected by the g*DPSeiColorDither() command. 
However, the user may invert the pattern, G_AD_NOTPATTERN, or simply 
pass the pattern through unchanged, G_ADJ?A I ' 1 HRN. The user may also 
select the noise pattern using G_AD_NOISE, or disable alpha dithering 
altogether using G_AD_DISABLE. 
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Note: The dithering of the RGB from 8 bits to S-fes^^ddin^ lsbs of noise 
to the original 8 bits (with clamping to prevent wrapping^ipfabled even in 
32 bit mode (8/8/8/8), where there is no truncation to be done. Since this 
one mode bit controls both RGB da|her:;and alpha dither (which always is 
needed, even m 32 bit mode), or|di|ue thi|^gs,;shpuld have the dither bit off in 
32 bit mode (so the 3 Lsbs don't|||t stegpfed bh)|||it transparent things 
should have this bit on in 32 bit mode, since the hSise from the alpha will be 
of the same order as the noise gratuitously addepto the RGB. 



F °9 

Suppose we want to "fog out" from sffldrnage to a constant color as a 
function (set up in the RSP) of depth. We will assume the fog parameter is 
set up (per ve/r||*)yin the stepped alpha of the shaded triangle primitive (see 
"Vertex FQ^-State'l||n,gage 169). We will use the fog register color 
(^*DP$etFffgColor()^W§M^pioi: to fade too. We will use the stepped shade 
alpha as a control" to determine how much of the fog color is used. The first 
cycle blend mux selects in Table 16-4, "Fog Mux Controls," on page 313 will 
achieve this effect, 

I Tabi4l6-4 Fog Mux Controls 

%, Mux Source Selected 



P select 0, pixel RGB 

'0 s A select 2, stepped shade alpha 

M select 3, fog register color 

fc||r : .. B select 0, 1.0 - stepped shade alpha 

Jlfom the blend equation, Equation 1, you can see that these selects perform 
it linear interpolation between the fog color and the color combiner output 
color. 
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Equation 2 Fog Blend Equation 



/ - j°?,V ararn x V^ X ^§^: ! M,;^ -f°gy aram ) xfogclr 
fogp0zm + : li) -fogparam 



The command g*DPSetRenderMode() is useSitp^ntrol these muxes as well 
as other blender modes, 1|||ejcommand 
g*DPSetRenderMode(GJ&iJ ; OG_SHADE_A,GJ!MJFOG_SHADE_A) 

implements the mux controls fo#J|is ; fog effect in G_CYC_1 CYCLE mode. 
Typically, this effect would be used of^-in G_CYC_2CYCLE mode,, with the 
second cycle performing the blend of the pixel with memory. For example, 
g*DPSetRenderMQde.(GJRM_FOG_SHADE_A, 

G_RM_AA_ZB_OPA_SURF2) enables fog while rendering antialiased. 
z-buffered, |>paque s|ff^tl|l- v ,In G_CYC__1 CYCLE mode, only the fogging 
operation #||%ld betperfoinilpl (no blend). 



Cc-ierage Calculation 

|pFrom ! '|ke previous discussion in "Coverage Unit" on page 306, coverage is 
r a 4-bit value that indicates how many subpLxels are occluded by a primitive. 
Note that |a; coverage of zero indicates that no subpixels were covered and 
the pixel df§§ ;s not need to be written to the frame buffer. Because there are 
only 3 bits ortbverage available in the frame buffer, the coverage stored is 
actually: 

Equation 3 Stored Coverage 

memcvg - coverage -1 



When the pixel is read from memory, a one is automatically added to restore 

the actual coverage before it is used in calculations. 

It is interesting to note that the Video Filter is concerned primarily with 

partially covered pixels around the silhouette edges of objects (see "Video 
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Filter" on page 326)^ Also, the antialiasing p^ormeifby the blender uses 
information about coverage wraps, i.e. when the stint of merrtory coverage 
and pixel coverage are greater than 1.0. Because of this^fhe 5 frame buffer is 
initially cleared such that the coverage bits are all one, see "Color Image 
Format" on page 318. 



Alpha Compare Caiculation 

From "Fill Mode" on jtpp 180 and "Cop^Mode" on page 180, you will 
notice that in G_CYC_CJi|& ;: arid G_CYC_FILL modes the blender 
hardware is bypassed and trW||lieolor or image is written with no 
opportunity for read /modify operations. 

Note: When.reradering in G_CYC_COPY or G_CYC_FILL, you should use 
the ]^nde^0odeM^M_NOOP to make sure that reading of Z and color is 

disabled^? llfHf lft| , 

You can acnleve a texture||dge effect in G_CYC_COPY mode, however, by 
using the pixel alpha thflfsholded with the blend register alpha 
fajpPSetBlendColorm figure 16-8, "Alpha Compare in Copy Mode for 8-bit 
|P||xiebuffer/' on page 316 shows that write enables are generated when the 
'tex'^alpha is greater than or equal to blend alpha for 8-bit framebuffers, 
Alsoflnote that for 16-bit RGBA texels there are no compares, the alpha bit 
simply acts as a write enable. Threshold alpha compare mode maybe set by 
the foll%ing command: g*DPSetAlphaCompare(G_AC_THRESHOLD). 
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Note: Alpha compare only works in Gj^CjC&$BMp&e for.tje 16-bit 
RGB A color and 8-bit image types. You cannot copy the 32-bi|RGBA color 
image type. .._.. b -*v^ 

Figure16-8 AlphaCo mp a re in Copy^ ;||6'cf e id ^;8-b it Fr ameb u ff er 



Blend Alpha 



I Random Alph 



V 



"7- 



gDPSetAIphaComJ|||| 



? Texture Memory 

Ml AJ A2 P_ 



liweO 



wel 



we2 



we3 



Another alpha compare modeliises a hardware generated pseudo-random 
number as the threshold al|jfe. To set this mode, use 
g*DB§etAlphaComparmiAC_DlTHER). 

jf th GlAC_DITHER and G_AC_THRESHOLD can be used in 
G_CYC_j|CYCLE or G_CYC_2CYCLE mode as well. In these modes, you 

can readily, change the pixel's alpha from frame to frame, allowing various 
fade effects^Tn order to get the alpha of the pixel to the comparators, you 
must set the I|PHA_X_CVG and ALPHA_CVG_SEL bits properly; 
Figure 16-9, "Alpha Compare in One /Two-Cycle Mode," on page 317 shows 

ji block diagram of the coverage /alpha combiner and alpha comparator 

-li&ic. These controls are usually set as part of the g*DPSetRenderMode 
corhinand. For example, the command 
g*I^$etRenderMode(GJiM_TEX_EDGE r G_RM_TEX_EDGE2) will do the 

jr|gjf thing with these mode bits. See Table 16-6 for details on which bits are 

iMit'for a particular RenderMode. 

For rendering effects such as smoke, clouds, or explosions, set the texture 
alpha to the outline of the smoke orexplosion and render the texture onto a 
transparent polygon so that one can see through the smoke to the objects 
behind. 
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In this situation, the correct g*DP$etRenderI^^$) ;? i&%se l ^ 1| 
G_RM_ZB_CLDSURF or GJRM_CLD_SURF. ' tsm 

This 'cloud' mode preserves the antialiasing of objects behind the cloud 
primitive, unlike TEX_EDGE and XLLLSURF modes. 

Figure 1 6-9 Alpha Compare in Gl|ey Two-Cycle' K%|e 



Combined Alpha" 



Key 



Key Mode -^y~ 



IP 



Z7 X 




~7- CVG_X_ALPHA 

Coverage 



stAiphaCornpare y 



7 



Pixel Coverage, to Blender 
AUHA_CVG_SEL 



■ Pixel Alpha, to Blender 



Ipender ADD Mode 

Jispecial blender mode has been implemented that allows the pixel color to 
lie added to the memory color: 



#define RM_ADD(clk) \ 

IM_RD I CVG_DST_SAVE j FORCE_BL | ZMODE_OPA | 
GBL_C##clk{G_BL_CLR_IN, G_BL_A_FOG, G_BL__CLR_MEM , 

G_BL_1 ) 
#define G_RM_ADD RM_ADD(1) 

#define G RM ADD2 RM_ADD(2) 
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Several notes about this mode: ^w&^Sl^' 

• You must set fog alpha equal to Oxff for this modjp§c|#brk, e.g. 
gsDPSetFogColor(255, 255, 255,,255). 

• Smce the blender does no|pamp" : ||ie final color (all the inputs are 
clamped and normal int^^olatiojflp^#Spns won't under /over 
flow) the user must guarar^felh^t the reSlitts will not overflow or 
"special effects" may occur. "'""" ©§ J|f 



Color Image Format jpl&v :i . 

The are three color image formats: S^^j^GBA, 16-bit RGB A, and 8-bit. In 
addition, there are hidden bits that are available to the RDP memory interface 
but not readily^yis^Ie to the programmer, see Figure 16-10, "Hidden Bits/' 
on page 319;;. /fhese' hidden bits come from the fact that the RCP uses 9-bit 
RDRAMs. Iptr 16-bi£|^|^types, the hidden bits are used for storing 
coverage, For: : 32-bifPxGBA iypes, the 3 coverage bits are stored as the 3 
MSBs of the 8-bit alpha channel and the hidden bits are ignored. Note that 
the 32-bit RGBA mode does not provide increased alpha resolution. For 
8-bifjCoior images, the hidclen bits are ignored. 

?:|fiere Hidden bits are logically the 2 LSBs of each 18-bit word. For memory 
| accesses 5 lorn other than the RDP memory interface (MI), only a 16-bit word 
is read /written. Other masters can indirectly set or clear the hidden bits by 
setting or clearing the LSB of the 16-bit word, respectively For example, if 
the CPU wriliithe 16-bit binary value 101010 10.101010 10 to memory, the 
memory interface will actually write. the 18-bit binary value 
s |gl01010_10101010_00. On the other hand, if the CPU writes the 16-bit 
binary value 01010101_01010101, the memory interface will actually write 
thlf 8-bit binary value 01010101 _01010101_11. 
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Figure 16-10 Hidden Bits 



15 




Short Ordering 
Byte Ordering 



0<\ Bit ordering 



Hidden Bits (2) 



Note: Hidden bits are only readVwKhe. n directly by the RDP memory 
Interface. They are logically positibled as the LSBs of every 16-bit 
word, indep|jp4ent of Color Image type. 




Pixel Ordering 
Byte Ordering 
Number of Bits 
Components 



f Bit Ordering 



Hidden Bits (2) 
16-bit RGBA Format Showing Hidden Bits 



IPigure 16-11, "Color Image Formats/' on page 320 describes the logical 
frame buffer formats. 
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ImageAlignmentRequirements 

The color image pointer, g*DPSetColorImage(), and the de'jiifirhage pointer, 
g*DPSetDepthImage(), should be aligned. to 64-bits, i.e. the 3 LSBs of the 
pointer should be zero. '' H W\ ; ._ 

Figure 16-11 Color Image Formats I|L £0 r " 
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Pixel Ordering 
Byte Ordering 
Number of Bits 
Components 

Bit Ordering 



Z Calculation 

As mentioned in the "Z Stepper" section, g*DPSetDepthSource() selects the 
source of Z for the depth compares used in the z-buffer algorithm. This 
selects between primitive Z (a register), g*DPSetPrimDepth() r and stepped Z 
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(from the triangle or line). G*DPSefDepf/iSowM4§;a|sc) :: selects : :between 
primative DeitaZ (a register) and stepped DeltaZ/ The. 16 ,J?i : primitive Z 
register can supply the 15 integer bits of the Z value aBePthe 16 bit deltaZ 
register can supply the 16 bits of the DeltaZ value. 

For each z-buffered primitive, :;f|e AaiJp-m ; Zf§er pixel change in the X and 
Y directions are calculated m the^r^ppart of s;%up. These values are used 
in the z-buffer logic of the blender to create a composite DeltaZ for the pixel: 

Equation 4 DeltaZ Catenation 

De I taZpi§?= ' W||L + \dZdy\ 

DeltaZpix = IdZdx I + I dZdy i 

The DeltaZ valuej|;p||prtant m determining surface correlation- that is, 
whether tits pixel-is parl<j| the same surface as the pixel that is stored m 
memory. When computing whether the pixel is part of the same surface, the 
worst case DeltaZ is, usej; r 

Equation 5 Max DeltaZ Calculation 

DeltaZmax = MAX(DeltaZpix,DeltaZmem) 



The z-buffer compare equations are: 

^j||uation 6 Max Z Test 

Jf MaxZ = (MemZ=MAXZ) 

Equation 7 Farther Compare 

Farther = (PixZ + DeltaZmax) >MemZ 
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Equation 8 Nearer Compare 

Nearer = (PixZ-DeltaZmax) <Men 

Equation 9 In Front Compare Iff j^^-mi^ 

InFront = PixZ<MemzM 



These signals are used aloi||||||kh coverage information to determine 
surface correlation for various tni%Iiasing modes. See "Blender Modes and 
Assumptions" on page 327, "^Sfe-,, 



Z Image Format 

The Z-buff eflqigic iiilhe blender uses a fixed point, 0,15.3, 18 bit number for 
Z calculations: The delta Z isjft 16 bit quantity that is used as a s 15 number. 
The linear 18-bit Z that i^S/tefiped, is converted to a 14 bit floating point 
fonspt before being stdrirc 'This encoding is shown in Figure 16-12, "Z 
Encoding," on page 322. 

Figure 16-12 Z Encoding 

Stepped Z 0,15.3 
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Exponent, 3 bits 
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Mantissa , 11 bits 



Three bits are stored for the exponent and 11 bits are stored for the mantissa. 
Here is some psuedo code for converting from the format stored in memory 
to the Z format used in calculations: 
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* Convert 11 bit mantissa and 3 bit exponent _ 

* to 0,15.3 number ** 
*/ ,«|i 



struct { 




int shift; 




long add; 




} z_format[8] = { 




6, 0x00000, 




5, 0x20000, .^ 




4, 0x3 000, 




3, 0x38000, ItP 




2, 0x3c000, '-£■''' 




1, 0x3e000, 




0, 0x3f000, 




0, 0x3 £ 80^,, 




zvalufe = (fi^htisMal;; 


<< z_format [exponent] -shift) + 


z_f ormalii 


exponent] .add; 



Notice that converting from a 18 bit fixed point number to a 14 bit floating 
If oSlnumber, some precision may be lost. The lose of precision is greatest 
for sftall exponents. The highest precision is saved for large Z values, that 
is, for Objects that are far away from the eye. 

The Deltal|s also encoded into 4 bit integer for storage into the Z-buffer 
using the f allowing equation: 

||quation 10 DeltaZ Encoding 

DeliaZmem = log 2 {DeltaZ-pix) 



This is just a priority encoding of the DeitaZ value. The bit number of the 
most significant bit that has a value of one is stored. 
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The memory format for the Z and DeltaZmemlfeshg^rfcin Figure 16-13, "Z 
Memory Format/' on page 324. ^..,jM : 

Figure 1 6-1 3 Z Memory Format ,*#■*,. 
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Pixel Ordering 
Byte Ordering 
Number of Bits 
Components 



"1 t> Bit Ordering 



4. 



Hidden Bits (2) 



Note: Hidden bits are only read/written directly by the RDP Memory 
Interface,p|i|||§re logically positioned as the LSBs of even' 16-bit 
word. i4f : 'i ; |. ...,: W?W . 
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Z Accuracy 

The plot in shows the worst-case percent error in Z rei!fr?e to the near and 
far planes. 

Fi gu re 1 6-1 4 Z Worst-Case Error || 
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is eye-space Z. 
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Video Filter 



The video filter performs the second pass of the anal ogizrhg 'algorithm. The 
first pass is done in the blender and:|f^'|fee s antialiasing of internal or 
non-silhouette edges. After the image is reh|ej^d,into the frame buffer, all 
pixels except those that are on the$§[houe$pJ '■&¥ : ol||cts will be fully covered 
(coverage = 1.0). For partially cove^^glpls, the Visieo filter performs a 
linear interpolation between the foreground color and the background color: 

Equation 11 Video Filter Interpolation 

OutpuiColor = cvg^0ort^ound+ (1.0 -cvg) xBackGround 



The ForeGroilfid colo|,^^%^s the color stored in the frame buffer for that 
pixel. The BaSkGroitfiti colof : t|Tound by examining fully covered pixels in a 
5x3 pixel area around the cujflnt pixel. Note that Z is not used in 
determining the BackGrpund^oior and so it is safe for Z to be single-buffered. 
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Blender Modes and Assumptions 



Opaque Surface Antialiased Z-Buffer Algorithm, 
OPA_SURF 

The main goal of this algorithm is>P : grbduce afi|?mti aliased rendering of 
polygonal surfaces without the need ^.sorting;.- The key to achieving this 
goal is to split the antialiasing problem up into Several pieces, each of which 
is readily implemented ;fffe,^ 

There are basically threi different-kinds of antialiasing. The first is the 
antialiasing of textures within polygons. This is accomplished outside of the 
blender by the texture hardware, using the industry standard mipmapping 
technique . This uses tn-linear interpolation to produce a correctly sampled 
texture lookup. Se%"MIP: ; Mapping" on page 232 for more details. 

The seconSffind or antiali^ng is the blending of polygon fragments within 
the pixels they share. ThejMassic example of this is the pinwheel, where 
alternating black andy^tHte triangles meet at a center vertex. The pixel 
, : #i|hin which this vertex lies should be the average of the colors of all the 
yfeaifeles which share this vertex, weighted by the. area of the pixel at the 
verte?c : - covered by each of the triangles. 

This blending is done in the blender hardware by computing Equation 1, 
where p is;|h,e color of the pixel of the new poly m is the color of the pixel in 
the frame buffer memory, a is the coverage value of the new poly, and b is 
the sum of the coverage values of all the polygons already blended into that 

ipixel in the frame buffer. Note that no matter what order the polygon 

"i|agments come in, they will all average in correctly. 

Jfpe third kind of antialiasing is the blending of the silhouette of a 
lioreground object against the background. This is traditionally done at 

rendering time in the blend unit. Unfortunately, doing it at this time has bad 

consequences for hidden surfacing. 

Consider an internal edge of a surface (i.e., an edge shared by two visible 
polygons not at the silhouette). A priori, when the first of the two polygons 
is rendered, the blender does not yet know whether it is a silhouette edge 
(and hence needs to be blended with the background), or an internal edge 
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(and hence should not be blended with the back : g^ur*elp "Slote that if an 
internal edge does blend with the background, there will be aline along the 
edge left when the second polygon blends with the first.-^ilfi'the blending 
is done, there is no way to undo it. Also, note that the background may not 
even have been rendered yet, unlf : sl :::; the rerfdermg of polygons is done in 
depth-sorted order, which defeats; jhe purjpe-ol?|buffering. 

The only way to deal with this is to postpone the hjending of silhouette 
edges until after the whole scene is rendefe|v|n : fegt> the final blending of the 
silhouette edges is done a'ljd^splay time by the video interface. While the 
details of this are beyond tj§|j|^?pe of this document, the mam point is that 
to do this blend on video 0utpui; : S|i|e needs to be a coverage value left 
behind in the frame buffer, with wfiilJl^interpolate between the 
foreground (the color of which is in theTfame buffer) and the background 
(which is assume&tobe in one or more of the neighboring pixels in the frame 
buffer). This ii|terpblahon is described in Equation 11. 

Note that follies appf oach to|vork, we must be able to distinguish between 
internal edges within a surface and silhouette edges between an object and 
its background. This is only; possible in the context of z-buffering. (If 
z-bjJUering is disabledflifiunternal edge blending must also be disabled, 
si§ce%e can no longer distinguish between internal and silhouette edges.) 

jffn order ito distinguish between an internal and a silhouette edge, we need 
in additio&to the normal z-buffer containing depth information, some 
additionall|jormation so that we can tell if two polygons sharing a pixel are 
within the saxixe surface or not. This added information is the slope of Z 
(depth) in screen space. This is computed as shown in Equation 4. The delta 

Jor the old polygon is stored in the frame buffer with the Z. The rule is then 

■ft>|he absolute difference in Z between the new polygon and the frame 
buffer is less than the max of the new DeltaZ and the frame buffer DeltaZ, 
thehilhe new polygon is considered to be part of the same surface as the old 
polygon already in the frame buffer. If the new Z is clearly in front, it 

Wferwrites the frame buffer. If it is clearly behind, it is not written at all. 

In fact, while this algorithm works as described above, it has some problems. 
First off, we are only representing one fragment per pixel. If there are 
multiple silhouettes within one pixel, there will be a slight artifact. There is 
some specialized hardware to reduce this effect (the divot circuit). However, 

some artifacts remain, and are simply tolerated. 
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The other, and considerably more visually obv^ogs-arjifact is|§§ 
"punch through", where part of an object which should havelffeen occluded 
"punches through" the object in front of it. This is causiftppthe z-buffer 
blending range being too large, usu||Hifoe to large DeltaZ's from polygons 
that are very "edge on" to the vi||pf>oin1f: Jhere are two different 
mechanisms to prevent this arti||t:t. JiP^-^|fe ; 

The first mechanism is to weight the weighting factors in the internal edge 
blend by how "edge on" they are. PolygB%tha£f #1 more "flat" are weighted 
more heavily than polygons that are more^eoge on". Thus, the 
punchmg-throughpolygo^i^^ttenuated relative the polygon it is punching 
through. $f "' ,? iv:||, 

The second mechanism to prevent puffcnthrough is to use the wrapping of 
the coverage yajue.to distinguish between contiguous surfaces and a "new" 
polygon thaiis nb|part of that surface. Basically, if the coverage wraps (i.e., 
new cvg +;;C5ld cvg ^©p^then the new polygon must not be part of the 
previouslylf^ndereW surfac|§{or background). In that case, instead of usmg 
the DeltaZ fShge, the z~buffft does a strict compare between the new and 
old z, ignoring the deltas^sjrice we know the new polygon is not part of the 
oldjsurface. 

■Note: Note that the silhouette antialiasing part of this algorithm depends on 
: not haf|ng shared edges across the silhouette (shared with the backfacing 
polygohs.adjacent to the silhouette). Consequently, back-facing polygons 
must be rejected (culled), or the coverage values at the silhouette edge will 
be incorrectifor the display-time pass of the antialiasing algorithm. This is 
generally desirable in any case, since this saves the rendering time for the 
back-facing polygons, which should be invisible. Note that this is only a 
||r.pblem for closed polygonal surfaces (hulls), but not for "open" .surfaces, 
like flags, which have "external" edges. So flag-like objects need to be 
represented in the display list twice, once frontfacing and once backfacing. 



Transparent Surfaces, XLU_SURF 

In addition to opaque surfaces, we would like to be able to do transparent 
surfaces with antialiasing and without the need to sort. There are two 
problems with this. 
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The first problem is avoiding sorting. Strictly sp&£$J000us is lifipossible. In 
order for the colors to be correctly blended from multiple, coipfed 
transparent surfaces, the surfaces need to be depth sorted' (of 'Carry around a 
lot of extra inf ormation, more than weihif ^memory for), so we just don't do 
the right thing. ,,,,:,, 

We do require all the transparent sirlage#;fo be rer^lered after the opaque 
surfaces, but aside from that segregatioh|l|here is J& sorting of the 
transparent (or opaque) surfaces. So muli^tocptired transparent surfaces 
will not be quite right. FirS|j|||:, this case doesff t come up much (most 
transparent surfaces are not colled, and it is rare for multiple transparent 
surfaces to Irne up). Secondly eveh^rt does, most people have had so little 
experience with multiple colored traflMg$iEency that they don't know what to 
expect. Generally speaking, rendering the transparent surfaces in the same 
order, regardiesjohdepth, looks just fine. 

The second j||pblem^|#: : liinsparency is internal edges. Here, we cannot do 
what we did'ipthe opaque surface case. The pixels at an internal edge of a 
transparent surface are now blinded with the (previously rendered, opaque) 
background, as are aH.thc^pi^ls in the interior of the transparent poly So if 
we, fender one polygon sharing an internal edge, and then render the other 
pofygoh sharing that same edge, we must be sure not to blend any pixel 
; „ twice, bi. .there will be a noticable line on the internal edge as a consequence 
pf blending twice. So we just don't blend internal edges of transparent 
surfaces. ' : %, 

In fact, this is -libit tricker than it seems. We still want the silhouette of a 
transparent object to be properly antialiased, so we need to be able to get the 
ggirtial coverage values for the silhouette edges, without double blending 
;: lfle|internal edges. This is done with a special mechanism provided just for 

transparency 

, : LIr#er control of a special mode bit (CLR_ON_CVG), we can inhibit the 
Writing of color (but not coverage) unless the coverage wraps (i.e., the sum 
of the old coverage in the frame buffer and the new coverage of the currently 
rendering polygon is greater than unity). On an internal edge of a 
transparent surface over a fully covered background, the first polygon will 
write the color, since full coverage plus any non-zero partial coverage must 
wrap. The coverage value is always written with the wrapped sum of the old 
pixel and new polygon coverage, which will be equal to the partial coverage 
of the new (first) poly. On the rendering of the second poly however, the 
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coverage values will sum to unity on the shaSd-jg <&p, which is not a wrap. 
So the second polygon will not write over the pixels qrvtrj^ shared edge of 
the first poly. Note that this works even if the underlying "coverage is not 
unity (i.e., the transparent surface is over a pre-rendered silhouette edge), 
since still only one of the two transparent .polygons sharing an internal edge 
will get to write (although it c&lld be..|he' x s%CG%d one instead of the first). 

The blender in transparent surface nftde use s. ; l 'different form of the blend 
equation than for the opaque surface "'laf e^T#ie : blend equation for 
transparency is: 1% 6 . : , 

Equation 12 |F ''^W^m 

color = a x pW(lS)-a) xm 



where p i| the colgBefe^e pixel of the new poly m is the color of the pixel in 
the frame-'b]Affer?memor%a is the opacity (alpha) of the new poly Note that 
this can be?6btained fromfiquation 1 by setting b=(l-a). 

Jfpte that since we never blend across an internal edge, we do not need to 
i;:ps£-the DeltaZ used to condition blending in the opaque surface case. 
Instead, we just compare Z directly, since the transparent surface can only be 
either clearly in front (in which case it is written with the 

transparency-blended color) or clearly behind (in which case it is not written 
at all, iriCjluding coverage). 

Note also that unlike opaque surfaces, which modify depth, transparent 
surfaces do not modify depth (although they do read it, to test for occlusion 
|by a previously-rendered opaque object). This is because transparent 
Surfaces do not want to prevent the writing of other transparent surfaces 
Jpiich are behind them (but in front of any opaque surfaces). 



Transparent Lines, XLU_LINE 



In this system, there is no explicit line generation hardware. So lines are 
rendered as degenerate polygons (i.e., a triangle two of whose sides are 
parallel, and whose third vertex is at infinity) using the normal triangle 
hardware. Rendering is very much like the rendering of surfaces. However, 
unlike surfaces, lines have no internal edges (since by definition, a line is an 
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edge). So here., we don't have to worry about uie^|re(:ty:l)Iend|hg internal 
edges at render time. So for lines, all the antialiasifl^^abne at fender time. 
Note, however, that as with transparent surfaces, lines mustfie- rendered 
after any surfaces they may occlude. In fact, lines are considered intrinsically 
transparent. Opaque lines are simply- -tr'ahs|)a rent lines with an alpha of 
unity (or close to it). fjf M0Mm : , 

The render-time antialiasing is doneBy jL rrililtiplyrng.;me new polygon (line) 
coverage value with the alpha value, and^mgj^S: as the alpha to do the 
transparency blending. This produces the correct result, due to the absence 
of internal edges. 

The coverage value written into th^-lfcme buffer m line mode is the clamped 
sum of the old pixel coverage and the new line's coverage times its alpha. 
For nearly opaqueorxels, the coverage will be clamped to unity, making any 
underlying silh^ueftgiedge not be modified by the video mt erf ace at the 
display-timejjpart of tf|e,antiaHasing algorithm. This prevents the overlying 
line from bei^dist^fWSa'tj|^ie underlying (and hence hidden) silhouette 
edge. Howeve'ifif the coveragljtimes alpha from the line is nearly zero, then 
the silhouette edge is not dis^prbed, since it should be visible through the 
line^, ■*W0-0' i 

liiltes <ib read depth, and thus can be occluded by opaque objects. However, 
flines, life, transparent and decal surfaces, do not modify depth. They are 
'thus blended in display list order, which for thin lines should not matter. 

Note that "lih||" need not be degenerate triangles. In particular, for a "ray" 
coming from somewhere in the foreground to a vanishing point at infinity, a 
normal triangle, with two vertices at the source of the ray, and the third at 
t|r|p:-..vanishrng point, produces the desired effect. Also note that these "rays" 
caxiifee textured, to produce the effect of a diffuse particle beam (or "neon 
gIo|||), or even "tracer bullets" animated by changing texture coordinate 
m^fping in the texture unit 



Texture Edge Mode, TEX__EDGE 

Texture edge mode is the first of the special-purpose modes. It is a variation 
of opaque surface mode. It is intended mostly for 'billboard' type objects. 
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A textured 'billboard' uses alpha values of 3e^& ;; i%tie '"textile to define the 
outline of the tree. Either two billboards are crossed, or the>phe billboard 
moves to always face the eyepoint, so as to hide the two>$irhensional nature 
of the billboard. Frequently, only onebit of alpha (all or nothing) is available 
in the highly-packed texture modes usually used for billboards. 
Mipmappmg can be used to maintain a ; r|fpp|liy anhahased tree texture, but 
at some point the eye can get close enough to the tree texture to exceed the 
highest level of detail. In this case 'the ; |lpha will-lie interpolated over several 
pixels, creating a 'blurry' effect arountj;the te^ttire edges. 

Texture edge mode sirnpty^llows the blurred alpha to be written as 
coverage. A blunynessin cfcVygr^ge does not produce a blurryness in the 
final image, since the backend ffilffcsiinply ignores the internal partial 
coverage bits, recreating a sharp edgeV 



Decal ^facespf^_DECAL, XLU_DECAL 

In order to'fhake the creajion of models with complex details as simple as 
possible, we added a special mode to allow the rendering of 'decal' polygons 
Usually with a texture" on them, like a flag or logo) over a previously 

^nclered opaque surface. Unlike normal rendering, here we only want to 

''rerijgr the decal if it is coplanar with the existing surface. Since we have the 
hardware to tell if a surface is (roughly) coplanar from the opaque surface 
blend case, we can use that to condition the writes of the decal. Otherwise 
the rendering is just like the opaque surface case. Here we rely on the opaque 
surface mechanism which conditions blends on the coverage value not 
wrapping. This insures that a decal polygon written over a fully covered 
surface will not blend with that surface, but will instead overwrite it. 

Ijnternal edges of a decal will, however) be properly blended (with each 

l§|her, but not with the underlying surface). 

fphe coverage values of the decal surface wrap (as do opaque and 
"transparent surfaces). Note that this only works well if the edge of the decal 
polygons do not coincide with a silhouette edge of the underlying surface. 
If this is the case, it would help to use clamping for coverage since this will 
result in simple aliasing. Using wrap in this case fails miserably, since the 
coverage values are double what they should be, with some of them 
wrapping and some of them not. However, even clamping is wrong. So 
decals should never be allowed to exactly coincide with a silhouette edge of 
the underlying surface. 
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Decal surfaces, like transparent surfaces do not ml$i||pepth, Since they are 
supposed to be coplanar with the underlying surface, which^ajleady has the 
correct depth. '^te^" 

Note that there is also a transpare^i ; versiOi|,of ;; decals, for cases where some 
of the underlying surface should;|lend tf)fGfegft; : |pus uses the same decal 
z-buffering algorithm, but is omei^^e;|lie transparent surface mode. 



Decal Lines, DEC_LINE 

This mode also goes by tKi'nam% ; Wron mode", smce its mam effect is to 
exaggerate the poly gonalness of an ODpljIr making it look more artificial, and 
hence more "hi-tech" (at least in the eyes' of some artists). Like decal surfaces, 
the decal Imes : Mej;Only rendered if they are within the depth range of the 
underlying sjprace;. s^hvich must be rendered before the decal line. 

Aside from S^difflfent z-b1|}fer algorithm, the only other difference 
between transparent lines aiflfdecal lines is the coverage written into frame 
buffer memory. For decaUpes we do not modify coverage at all. This is so 
wejto not disturb the antialiasing of the silhouette edges. Note that the half 
Q|.;ffi!:3ine which is "over the edge" of the silhouette will not be rendered. 
jponseduently, while the inside edge of the decal line at the silhouette will be 
Kzorrectlffantialiased at render time (as with transparent lines), the outside 
edge musf Still be anti aliased at display time by the video interface. The 
coverage vl&gs at the silhouette are already correct before the decal lines are 
rendered. Infi&ial edges are also already correct, since they are fully covered 
by the opaque surface rendering. 

i; Slote that the decal line case interacts poorly with one of the features of the 
villp interface (the divot circuit). In particular, if a decal line is on the 
silhouette of an object, the divot circuit can disturb the decal lines at the 
sStjlouette. This can be avoided by not using decal lines anywhere they could 
*§e in the silhouette, or by turning off the divot circuit (at the loss of some 
antialiasing quality). Or it can simply be tolerated as it is. The effect is a 
thinning and breaking up of the decal line at the silhouette. In motion, the 
line doesn't scintillate much, and so is probably tolerable. 
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Interpenetration, OPAJNTER, XLU JNTER 

Interpenetration is another special purpose mode, whiefeatows antialiased 
interpenetration of polygons to a j^|§©nable approximation, at the cost of 
some loss of protection against ^imcrvijro^|h''. This mode is intended for 
protrusions ("spikes") throughl%ormaI opaque surface, and for terrain, so 
the placement of objects (like tr!%#m;fie surfafjlof the terrain need not be 
precise. Note that in the latter case, me ;ter rain shjfild be the interpenetrating 
surface, rendered last (after all the othll opaque objects m the foreground). 
This ordering both prevents unnecessary' ' punchthrough, as well as 
rendering more quickly (since the background terrain does not get written if 
it is behind an already rendered foreground object). Interpenetration mode 
should not be used for articulate11f||||s, or other purposes where the 
interpenetration is used to connect what is supposed to be a contiguous 
surface. If itifijlsejd in this way, unacceptable punchthrough will result. It is 
probably better in these cases to use normal opaque surface mode if this is 
really necf jfsary. Thelines of intersection will alias, but if the two surfaces are 
roughly the same color, this may not be too noticable. Interpenetration mode 
should notleused gratuitously There is both an opaque and transparent 
version of interpenetration mode. 

I The only down side of this is that interpenetration mode requires using the 

I wr Jjijping of coverage to select whether to do the coverage adjustment (if it 
wraps) and hence is a potentially interpenetrating surface) or not (if it 
doesnStyrap, and hence is assumed to be part of the same surface). This can 
result minacceptable punchthrough if any previously rendered objects are 
behind anieither very edge-on or very near the foreground interpenetration 
mode surface. This almost never happens for terrain (where an object is 
almost never both occluded and near the terrain surface), and is not terribly 

i||oticabIe in the case of small protrusions from a normal opaque surface 

lf|ject. 

Jff ote that interpenetrating polygons must be rendered after the surfaces 
"which they interpenetrate (which need not themselves have been rendered 

in interpenetration mode). Other than that, there are no sorting 

requirements. 
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Particle System Mode, PCL_SURF 

The so-called "particle system" mode is really just a cleviilalfe of the alpha 
dither compare function described ah^felhis is not a true particle system, 
where a large number of discrete ,|?pucles;;:|fiteract to produce some 
interesting effect (fire, explosions|water, etc.). This mode is just another 
polygonal rendering mode which ^a%Mitsed to rh|ke the surface of an 
object resemble the behavior of some kinl| of partite systems. Note that this 
is much more efficient than a "true" paxtM&§%00&, smce by this method, a 
large number of particles : -j|to be representecfof 'a much smaller number of 
polygons. The remarkable thing. about it is that it produces properly 
antialiased silhouettes wift ? corf 1|!|3£ rendered mtemal edges. 

This mode is an odd hybrid of the normal 3D opaque surface mode and the 
2D alpha dither -compare mode. As described in "Alpha Compare 
Calculation" ;©nplpl'3 15, alpha dither compare (G_AC_DITHER) is a way 
of getting "| jpple t00§^ficy", on a pixel by pixel basis, by allowing a 
write of the flxel oritf if its alpha value is greater than the value of a random 
number between 0.0 and l.Qjjjjfhis makes the probability of a write 
proportional to the alpha,y|lue, which averaging over many frames 
profcces the effect of transparency- The most obvious use of this effect is a 
"J|ahfporter", where the object starts out opaque (alpha = 1.0), but then 

|flcies% nothing (alpha = 0.0) m a cloud of sparkles. With some other effects 

: ; added in (textures, inverse transparency, etc.), this mode can also be used for 
explosions^ fire, and the like. By animating the alphas with texture mapping, 
propagating;" waves" of alpha can be produced. Due to the human visual 
system's predilection for finding patterns whether they are there or not (e.g., 
the "canals" on Mars), even though the "particles" are completely 

jiincorrelated, the waves of alpha will create the perception of coordinated 

'Ifeavior among a large number of interacting particles. 

In .ips mode, the interior of a polygon is strictly under the control of the 
, Jjijffta dither compare. The probability of a write is proportional to the alpha 
Walue. The silhouette edge is handled as for opaque surfaces, at display time 

in the video interface. The tricky thing is what to do about the internal edges 

of a surface. 

Note that in this alpha dither compare case, the density of the neighborhood 
is a function of alpha. This means that on a shared internal edge, a blend will 
only be likely to occur if the alpha value is quite high. In fact, the probability 
of a blend is proportional to the square of the alpha value. If the blend 
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doesn't happen, then the internal edge is treated -Mlse^ silhouette edge, and 
as long as the neighborhood has enough uncovered pixels, |l$e display-time 
antialiasing of these partially covered internal edge pixMswill do the right 
thing. So the only possible problem;;iS;:w v ith internal edges at high alpha 
values, and here, the weighted j^erage^ill just merge the (nearly 
identically colored) fragments ifom theiftvi^pil^ons with possibly the 
wrong weights. But smce the tw^ragiaents are nearly identical, any error 
in weighting doesn't matter. 



Blender Modes Truth Table 

The g*DPSetRenderMode() macro setsjall of the blender state necessary for 
different types of surfaces and antialiasing. The following tables map the 
RenderMode.g$i§§p\enis to individual mode settings. The macro names used 
are from the ghi.h header file. 

Mode Bi$||escrir^$hs? ::: ^|?. 

AA_EN: if not force bjjjtd, allow blend enable - use cvg bits 

Z_CMP: condition color write enable on depth comparison 

;^2iiPD: enable writmg of Z if color write enabled 

IMjRD: enable color /cvg read /modify /write memory access 

CVG JtgT[l:0]: 0) clamp if blend_en, new if !blend_en 1) wrap always 2) 
zap (force to full cvg) 3) save (don't overwrite memory cvg) 

CLR„ON„CVG: only update color on cvg overflow (transp surf) 

CVG_X_ALPHA: use alpha times cvg for pixel alpha and cvg 

f||LPHA_CVG_SEL: use cvg (or alpha*cvg) for pixel alpha 
FORCE_BL: force blend enable 

SmODE: 0) opaque 1) interpenetrating 2) transparent 3) decal 

alpha_compare_en: condition color write enable on alpha compare, use the 
g*DP$etAlphaCompare() command to set. 

dither_alpha_en: compare alpha with pseudo-random noise (dithering), 
use the g*DPSetAlphaCompare() command to set. 
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Blender Mux Selects described in Table 16-1, "f^m^MM^ Ing&its/' on 
page 310, Table 16-2, "A Mux Inpuisf '-on pagg311, and 
Table 16-3, "B Mux Inputs," on page 311|| :/F 

Note: 

(1) Interpenetration is only mea^mgfu(;|t:M|^liased z-buffered mode. 

(2) Always zap coverage in poMt^St&ripled mo|||s. 

(3) If CLR_ON_CVG, must also FORCE_BL. . 

(4) If not CVG_X_ALF%^and ALPHA_CVG_SEL, must not 

FORCE_BL. .F%$m%, 

(5) Always FORCE_BL on non-^i|||^|ered modes. 

(6) In opaque surface mode, clamp/new CVG_DST mode works better 
on the edg^sSfa-decaled surface which closely corresponds to the 
edge of the' imde|lyi|ig surface. Otherwise, use the wrap CVG_DST 
mode. |!| . S^ m ^Wk: 

(7) To plate new color regardless of other conditions, use FORCE_BL 
with p= don't care; m=ppeLcolor; a=zero; b=one; and don't enable 

$$me 3j^5 enumerates the recommended rendering modes for 3D graphics, 
plscussell above in some detail. They are what the rendering engine was 

primarily designed to do. They produce the best visual quality at 

near-op tiirt|| efficiency. 

Sub surface mode, SUB_SURF, is intended to be used as a way to get an 

^opaque object upon which an antialiased transparent surface can be 
overlaid. The coverage values from the transparent surface will fill in the 
zaj>||ed coverage values from the initial opaque surface. 

The : terrain modes, *_TERR, are to get around the modification of the 
! blending weights by DeltaZ, which was intended for punch through 
reduction. This causes aliasing of internal edges in cases where the object 
faces are non-coplanar. These new modes use the normal lerp blender mode, 
which is free of DeltaZ dependence, and hence doesn't alias.- Note, however, 
that these modes do not handle "pin wheels" correctly, since they assume 
that only two polygons meet at any pixel, which is generally not true. But 
in the case of terrains, which have very large polygons, this is more nearly 
correct. 
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Table 16-5 Antialiased Z-buffered Rendering Modes, G_RM^AA£ZB 
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Table 16-6 ehtirherates modejfthat are primarily for situations where the 
sorting by depth of a scene jfjprivial, for example, the terrain for a flight 
simulator (as long as it j%f4fitoo mountainous). Otherwise, the cost of 
sorftia the polygons by depth would be prohibitive. These modes can be 
rjIxeSjnd matched with any of the other rendering modes, z-buff ered or 
iliBt. Ncl|| that for proper antialiasing, polygons should be rendered in 
forward pointer's algorithm order (back to front), NOT inverse order. (This 
is NOT thef g-buffer" algorithm, which requires inverse painter's algorithm 
order.) So iriWmixed rendering mode scene, any non- z-buff ered background 
polygons should be rendered first. 

fliote that there is no decal surface mode. Since there is no Z to condition the 
bleftel, decal surface mode is identical to opaque surface mode. There is a 
decjjline mode, since it is slightly different in the way it handles silhouette 
edj|p. Also since there is no z, there are no interpenetration modes. 

The line modes are very similar to the z-buffered line modes, except that 
decal line mode zaps coverage to unity. This is because in the non-Z case, 

both sides of the line are rendered, and are already correctly antialiased at 
render time. For the non-line modes, blending is based on coverage wrap, 
since there is no Z to discriminate between new and contiguous surfaces. 

Sub surface mode is intended to be used as a way to get an opaque object 
upon which an antialiased transparent surface can be overlaid. The coverage 
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values from the transparent surface will fill m^e^af ped coverage values 
from the initial opaque surface. 

The terrain modes axe to get arour$4#te„ modification of the blending 
weights by DeltaZ, which was rritencteftfor gunch through reduction. This 
causes aliasing of internal edgfljin case|;;:Wrfe|g;the object faces are 
non-coplanar. These new modes use the normal lerp blender mode, which is 
free of DeltaZ dependence, and hence:; doesn't alas. Note, however, that 
these modes do not handle "pinwheels" correctly, since they assume that 
only two polygons meefcat any pixel, which is generally not true. But m the 
case of terrains, which h|v|;yery large polygons, this is more nearly correct. 

Table 16-6 Antialiased Non-Z-B'tirJer-ed Rendering Modes, G_RM_AA 
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Thf^Qi^t-sampled rendering modes in Table 16-7 are provided for 

qompleleness. They have no significant performance advantage over the 
|pntialiai| d modes. These modes can be mixed and matched with any of the 
'other rendering modes, antialiased or not, and so could be used for "special 

effects" within an otherwise antialiased scene. Generally speaking, point 

sampling lookghad, and should be avoided. 

jNJote that there is no distinction between point-sampled line and surface 
; ||||des, since lines and surfaces only differ in the way they are antialiased. 

Fo'ftihe same reason there are no point-sampled interpenetration or texture 

ed§||inodes. 

Ilif the point-sampled modes listed, coverage is usually zapped to unity to 
prevent the video interface from trying to antialias them. Note also that in 
these modes, because the coverage always wraps (since it is always fully 
covered to begin with), surfaces are never blended, and the DeltaZ range is 
never used in the z~buffering. 

Cloud and overlay surface modes are versions of transparent surface and 
transparent decal surface which do not disturb coverage. These are intended 
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as overlays, where the silhouette of the polyg^.w^jnave zero opacity, and 
hence should not affect the antialiasing of the imaged (Notejfpat textures can 
still be bile rped, which is the only kind of antiali as ing'ffiat matters in this 
case. ,,«! 



Table 16-7 Point-Sampled Z-Bu 



Renierms Modes, G_RM_ZB 
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The point-sampled, non-z-buffered rendering modes in Table 16-8 are 
provided for completeness. They have no significant performance 
advantage over the antialiased modes. 

Since there is neither antialiasing nor z-buffering, there is no difference 
between lines and surfaces, and no such thing as interpenetration, decals, or 
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texture edges. Only the transparent surface modi;||gufrts r the reading of the 
frame buffer at render time. The opaque modes simply oyervy|i||e the color 
and zap the coverage in the frame buffer. M^ : M- : ~' 

Cloud surface mode, CLD_SURF, Jf'a versions of transparent surface mode 
which does not disturb coverage. ;jfus is ir^piellas an overlay, where the 
silhouette of the polygon will havt zero opacity, and hence should not affect 
the antialiasing of the image. (Note that textures can still be bilerped, which 
is the only kind of antialiasing that matte£s;^.J^;p£ase. 

The ADD render mode adi||'&^ ;! £ixel color to the memory color. Note that 
you must set the fog alpha to Oxff for this mode to work, e.g. 
gsDPSetFogColor(255, 255, 255, 255jfl|f|£,e the blender does not clamp it's 
output values (all the inputs are clampem and the normal interpolation 
operations won^|ii|der/ overflow) the user must guarantee that the results 
of the add opef atidn%ill not overflow or weird results (effects?) may occur. 

The NOOP rrf|de is-Smiply^a|i|ode that disables reading of color and Z and 
zeros the rest of the blender state. You should set this render mode when the 
cycle type is either G JZYCJ^LL or G_CYC_COPY. 

The pJlSS mode is used when the cycle type is G_C YC_2CYCLE. In this case 

jfu may not want to do anything on the first cycle but blend in the second 
ifycle. Alfcexample is: gsDPSetRenderMode(G_RM_PASS, 
G_RMjM^SURF). 
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Table 1 6-8 Point-Sampled Non-Z-Buffered Rendering Model 
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^Creating New Blender Modes 

There are two types of mode bits in the blender, cycle-dependent and 
cycle-independent. The blender mux controls are cycle-dependent since 
they may differ between cycle and cycle 1. All the other mode bits in the 
blender do not change between cycleO and cycle 1. The 
g*DPSetRenderMode() command is set up to take two arguments. See the 
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discussion in "Antialiasing Modes" on page 204 for details on how to make 
calls with g*DPSetRenderMode() . 

To define a new RenderMode you musj^nsate a new macro that takes the 
cycle number (1 or 2) as an argumejt Fdr:|Xample: 

# de f i n e RM_AA_Z B_0 P A_ S URF ( c 1 k ) ||| :| iJH W$%. \ 

AA_EN | Z_CMP | Z_UPD 1 IM_RD | : ^£J3ST,^£amP \ ' { "mk ^ 
ZMODE_OPA | ALPHA_CVG_SEL | ~'^W:'^BrM ''M- ^ 

GB L_c # # c 1 k ( G_BL_CLR_IN , G_BL_A_IN , ' ' g1ec||CLR_MEM , J||bI j _A_MEM ) 

This macro OR's the mode, hits that are not' & cf &lte-dependent together with 
the blender mux controls tKp|||re cycle-dependent Next define two macros 
that instance the macro abpve for each clock cycle: 

#def ine G_RM_AA_ZB_OPA_SURF' '"' RM_AA*ZE^OPA_SURF < 1 ) 

# define G_RM_AA_ZB_OPA_SURF2 RM_AA_Z^gi.pA_SimF (2 ) 

To use this mode^c/ti could make the following call: 

gsDPSetRenderMs^S : ( : -d3SJNAA_Z;B_0PA_SURF, G_RM,AA_ZB_OPA_SURF2 ) 

Note: Creafijfg new; ; cSirroif for the blender mux is fairly straightforward. 
Setting the other blender modes, however, presumes a detailed 
understanding of the hardware since many of these modes are 

interdependent. life: 4~M- 



#tsua%ing Coverage 

As a speciallbpnus render mode, we have added G_RM_VISCVG. This 
mode will ctMl|ay coverage in the frame buffer as gray-scale intensities. To 
use this modef 

!?4v Render you entire scene, but don't send FullSync yet. 

2. :; $;§end the following display list: 
tlf sDPPipe Sync i } , 
ilgsDPSetCycleType (G_CYC_1 CYCLE) , 
'gsDPSetBlendColor{255, 255, 255, 255), 
gsDPSetPrimDepth(Oxffff , Oxffff ) , 
gsDPSetDepthSource(G_ZS_PRIM) , 
gsDPSetRenderMode { G_RM_VISCVG , G_RM_VISCVG2 ) , 
gsDPFi llRec tangle (0, 0, SCREENJWD-1, SCREEN_HT-1) , 

Partial coverage will be displayed as darker shades of gray and full coverage 
will be displayed as almost white. Try experimenting with different 
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antialiasing methods while visualizing the eoSlrage^to increase your 
understanding of these algorithms. 
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Chapter 17 

Sprites 



This chapter describes the use of SpRles. Sprites are rectangular images or 
textures ma^ypy. draw on the screen. Large images must be drawn in small 
pieces cali|{3 r " J/ tif5," Managing these pieces is the task of the Sprite Library 
and associated daf&;f lectures. This chapter explains how to do simple 
things, slllh.as clear the fkamebuffer with a specified image; and how to do 
complex tmngs, such as draw multi-colored text or explosions. 

JJere is a simple ouliirM for this chapter: 



implication Programmers Interface (API) 
Making 
Manipulating 
Drawing 

Data Structures and Attributes 
Bitmaps 

Sprites 
Attributes 



Tricks and Techniques 
Sparse Sprites 
Early Ending 
Variable Size Bitmaps 
Explosions 
Bitmap Re-use 
Sprite Re-use 
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Examples 
Backgrounds 
Text (Fonts) 
Simple Game 



350 



NINTENDO DRAFT ,,^m SPRITES 



Application Program Interface (API) 



MakingSprites 

Sprites are usually used to dra^knagtS'bnto tl^screen. For these simple 
cases, a few scripts are provided to aut|jmancal^§take a specified image and 
generate an appropriate sprite data s fracture .,T$ie generated sprite may then 
be edited manually ormpdified at run time 'id create dynamic behavior. 

m>ispnte name migfile.r^ iit&tjleY overlap > spjname.h 

This program takes a Silicon Graphicsimage file and generates a sprite. This 
sprite consists of a number of individual bitmaps (tiles) that are hleX apart 
in the x dire|clbn;;|nd tileY apart in the y direction. If overlap is "0," then 
these bitmaps are exactly tileX by bleY in size and should not be scaled (see 
spScale()U| overlap is : *IJk then the tiles are (tileX+1) by (tileY+1) in size. 
These sprifif may be scal|j| and the textures will be properly interpolated. 
This extra pixel of overlap, or "border/' provides the required data to create 
sjapoth transitions between tiles. The generated file may be included in an 
,fpjpication and the sprite may be manipulated with the name "name." 

mkil||:ite name imgftle.rgb tileX tileY overlap > sp_name.h 

This command is just like mksprite, except that it converts the image to an 
8-bit Color Index format, computes the TLUT, and generates the sprite with 
all the appropriate changes to support this format. 



Manipulating Sprites 

,;|pid spInit(Gfx **glistp) 

This routine is called at the beginning of sprite drawing. Some GBI display 
list commands are added to the specified glistp to get the RCP into the 
correct mode for sprite rendering. This sets default texturing modes. 

void spFinish(Gfx **glistp) 
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This routine is called at the end of sprite drawih|f.;§prrtf ! CBI display list 
commands are added to the specified glistp to get'lro&CP tQjpmplete all 
pending drawing operations and reset the RCP to its re guiai 1 state. It also 
tacks on a gEndDisp!ayList(). :f 0Mh>,, 

void spMove (Sprite *sp, s32 x, s3|||/) 

This routine sets the screen position of the upper Ifejt-hand comer of the 

Sprite. '^IS^M^F 

void spScale (Sprite *sp, f3|§f|f32 sy) 

This routine sets the resizing amourf|i^||vthis sprite. Scales may be less than 
1.0 to produce a smaller image, or greater" than 1 to create an expanded 
image. .^Ipte:^ 

void spSetZJSprite *sp, s32 z) 

This routine sirs the z-bufferipepth of the sprite. This may cause the sprite 
to be obscured by preyiousj|pclrawn sprites that were drawn with a smaller 
val|||pf Z. 

; .ioid si|Color {Sprite *sp, u8 red, u8 green, u8 blue, u8 alpha) 

This routine sets the color of the sprite. Based on how the sprite is to be 
drawn, ml|||uld be either the PRIMITTVE_COLOR or the FILL_COLOR. 

void spSetAttribute (Sprite *sp, s32 attr) 

: ' :; l|fe routine sets the indicated attributes, "attr" can be the bit-wise OR of 
mafty attributes. 

yoli spClear Attribute (Sprite *sp, s32 attr) 

This routine clears the indicated attributes, "attr" can be the bit-wise OR of 

many attributes. 

void spScissor (s32 xmin, s32 xmax, s32 ymin, s32 ymax) 
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This routine specifies the bounding region iri:fhich,#ites fyill be drawn . 
By default, this region is initialled' With xmin-0, xmax=319, 
ymin-0, and ymax-239. 

DrawingSprites Jl M ■ 

Gfx *spDraw (Sprite *sp) ~ M *" f 

This routine constructs a display list startihg'at sp->next_dl that draws the 
sprite into the framebuffer in the indicated way. This display list is 
terminated with an gEndDisplayListQ entry, and the sp->next_dl entry is 
updated to point to one entry past %, The pointer to the start of this display 
list is returned. 
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Data Structures and Attributes 



BitmapStructure 

Here is the actual structure of a sf^gje b|ip : ap: = i|;:; x 

typedef struct bitmap { M:' 

sl6width;/* Size across to draw^;;|||g;§txels */ 
/* Done if width : -0®^,. */ 

s!6width_img; /* Aofcua^giiize across in cexels */ 

sl6s;/* Horizontal of f set "lll|g| bitmap */ 
/* if (s > width_img) , then I'bad only! */ 

sl6t;/* :g M££'fcfeal offset into base */ 

void* buf ; / * P : §|n|;ip, : , t obi tmap da t a * / 

/* Don ; |i: u re-|%ad il|iiew buf */ 

/* is the same as tlftj old one */ 
/* Skip if NULL */ 

4§ ; 16actualHeight;'7 : *"' True Height of this bitmap piece */ 

j-f:/ sl:6 LUTo f f s e t ; / * LUT base index (for 4 -b i t CI Texs) * / 

"■} Bitm-kp; 



Sprite Structure 

"'ti^edef struct, sprite { 

fpl6x,y; /* Target position */ 

;;:;;#?■ sl6width, /* Target size (before scaling */ 
^ height; 

f32scalex,/* Texel to Pixel scale factor */ 

scaley; 

sl6expx, expy;/* Explosion spacing */ 

ul6attr;/* Attribute Flags */ 
sl6zdepth;/* Z Depth */ 
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u8red,/* Primitive Color */ '§>>. 

green, "'*■?'■&$' .f ; :<) 

blue, <$:.-:;< ; '-'4' ; 

alpha ; .,.,***■,„. 

u!6startTLUT; /* Lookup^Tabie^;Entry Starting index */ 

sl6nTLUT;/* Total num||r ofi^tJT''" Entries */ 

s!6*LUT;/* Pointer to fi&O^ui^ Table if/ 

sl6istart;/* Starting bi tmap:4 ; :in;di|l p ' * / 

slSistep;/* Bi tMli^ index step (see SP_INCY) */ 
/* if 0, then variaki^ width bitmaps */ 

sl6nbi tmaps; /* Total m!ri|fgsC of bitmaps */ 

sl6ndisplist ; /* Total number of display-list words */ 

sl6bmhetg#jt;7* Bitmap Tex el height (Used) */ 

sl61^realj^|;;::i|';|map Texel height (Real) */ 

u8bmrmt ; / * '" Bi tmap^Format * / 

u8bmsiz;/* Bitmap-' Texel Size */ 

SB, Bitmap *bitmap; /* Pointer to first bitmap */ 

v|Gfx*rsp_dl; /■* Pointer to RSP display list */ 

"&i|x*rsp_dl_next ; /* Pointer to next RSP DL entry */ 

} Sprite.; 



Attributes 

fffSprite attributes permit sprites to. be used in a variety of different ways. The 
"following detailed description of each attribute indicates how setting or 
clearing that attribute affects the appearance of the drawn sprite. Note also 
that these attributes are as independent as possible, thus greatly expanding 
the available variety and uses for sprites. 
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SP_TRANSPARENT '"ttfe^Jf 

This attribute permits the Alpha blending of the spnte t^t^flwith the 
background. 

S P_C UTO UT 

Use alpha compare hardware to not drafe/pixels \$$th an alpha less than the 
blend color alpha (automatically set to T^t^^^W 

SP_HIDDEN 

This attribute makes spDraw() on tKe^gjgrite return without generating a 
display list. 

SP^Z |f" ||, # p^ 

This attribute Specifies that 2f|>uff ere ring should be on while drawing the 
sprite. 

SP_SCALE 

This attribute specifies that the sprite should be scaled in both X and Y by the 
amount indicated in scalex and scaley. 

SP_FASTCOil' 

This attribute indicates that the sprite should be drawn in COPY mode. This 
il^duces the fastest possible drawing speed for background clears. 

SP_TEXSHIFT 

This attribute indicates that textures are to be shifted exactly 1/2 texel in 
both s and t before drawing it. This creates a better antialiased edge along 
transparent texture boundaries when in cutout mode.. 
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SP_FRACPOS 

This attribute indicates that the frac_s and frac_t neld§f§|§§fe sprite structure 
are to be used to fine-position tlje^t^fure into the drawn pixels.. 

S P „TEXS H U F 

This attribute indicates that the tile textures have their odd lines pre-shuffled 
to work around a UmdTextureBlocU3r$^^\ezti. See the Texture Mapping 

chapter for more detalj^ign this problem.';?"'"" 

SP_EXTERN ■♦' 

This attribute indicates that existing drawing modes are to be used rather 
than the sprite-routines explicitly setting them. 
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Tricks and Techniques 



SparseSprites 

The buf in a bitmap entry may be|||JJLL^MM'calS\^ that nothing should be 
drawn. This area will be 100% transppgrti 



Early-Ending Sprites 

Setting the width of a bitrffap entrftf g;£ero (0) signals an early exit to 
drawing the sprite's bitmaps. :; WP 



Variable S!2e Bitmaps 

Each bitmap%-ah have a different drawn "width" and the corresponding 
texture can have a differentjlj§idth_img. To vary the vertical size of a sprite, 
set the actual Jieight^eldv:;If this is bigger than the sprite's bmHeightReal, 
thep&is actualjieight is used for loading TMEM. 



Explosions 

Each sprite can.:. specify the spacing between tiles in pixels by setting the 
explx and expfy fields. The default value is zero (0). This spacing is not 
affected by the scaling of the sprite. 



Bitmap Re-use 

IPSie buf of the current bitmap matches the buf of the previous bitmap (not 
counting NULL bufs) in this sprite, then TMEM will not be re-loaded. This 
very simple form of texture caching is used in the font example. 
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Sprite Re-use *?£;; 

Each sprite has an associated display list and an associated. next_dl pointer. 
When spDraw is called, new display list entries are added to the area 
pointed at by next_dl. This doesn't have to correspond to the pre-allocated 
display list allocated for the sprite; it could point somewhere else. 

This allows a sprite to get drawn multiple times, each with a different setting 
of some parameters (position, scale, coior^s^aV textured, and so on). 
Sufficient display list area must be allocated for this to operate correctly. 



NU6-06-0030-O01 G of October 21 , 1 996 359 



NINTENDO 64 PROGRAMMING MANUAL DRAFT 



Examples 



A sample sprite library demonstration program is provideiSrf under 
/usr/src/PR/spgame. The demo sho^f inpw to use sprite library to do 
backgrounds, texts and a simple aJphatiol|i . : .. : , ?v 



Backgrounds 

Setting up copy mode. Ush^TLUTs to animate it. 
Scrolling Background example (uptown, left/right) 

Text (Fonts) 

void text_3{^e(Spr^;f^t^|^>ar *str, Font *mt, int xlen, int ylen) 

This creates the appropriate bitmap to render the specified string in the 
indicated sprite. You can-usl-a two-pass approach to render a larger number 
of characters. 

Simple Game 

Anvone for aicjuick game of pong? Explosions, animated textures. Too much 
fun! ^ 
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Chapter 18 

Sprite Microcode 



This chapter describes the use and operation of the Sprite Microcode, an 
alternative |||||ji| Sprite C Library described in the previous section. 

The motivations f|i^l%eation of the Sprite Microcode were to provide an 
API wrucftlyas mire familiar to traditional 2D content developers, as well 
as offloadinf expensive cjfulations fr° m ^ e CPU to ^ e otherwise largely 
idle RSR By making use : :: ; of : the Sprite Microcode, applications gain access to 
additional CPU cycles per frame to perform game related computations. 

"Thllprite Microcode can co-exist with the Sprite Library in an application. 
Depending on the situation, either the Sprite C Library or the Sprite 
Microcode will be more appropriate at particular points in the game. One 
exampl&jyhere the Sprite C library would be more appropriate is for 
drawing text on the screen. An example where the Sprite Microcode would 
be more appropriate is the display of large textured background images 
which would require a large amount of CPU time by the Sprite Library to 
l|£tup. The two APIs are also fairly different in their styles and the features 
3tey support. Developers are encouraged to try both methods to see which 
jfs their needs more closely 
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Sprite Microcode Functionality 



The functionality provided by the Sprite Microcode is the ability to display 
a subimage of arbitrary location and size out of a larger DRAM resident 
image of arbitrary texture type anllfslze \^f|h c>gtional scaling or mirroring 
in the X/Y axes. fit • >|t { , 




Larger than 4K subimage 



Large DRAM texture image 



X/Y Scaled/mirrored screen image 
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Sprite Microcode API 



The API provided for access to the Sprite Microcode is encapsulated into two 
new instructions illustrated by the following code fragment: 



# i nc 1 ude "gu.h" 
# include "gbi.h" 

uSprite MySprita||gi, &; 

guSprite2DInit (|SySpiri|f , ImagePointer , TlutPoinner , 
Image W i dlii|||Eec t ang 1 eWid th , 
RectangleHeUfht , 
..^m*:* . ImageType , Images i z e , 

TextureScaleX, TextureScaleY, 
||,. .;,;;; j^ipTex tur eX , F 1 ipText ur e Y , 
|f| : ifP^'T^tureStartS , TextureStartT , 

Tr^slateHorizontal, Trans lateVertical) ; 

gSPSprite2D(glislp++, OS_K0_TO_PHYSICAL (&My Sprite) } ; 

|re MySprite is defined as a structure of type: 



typedef struct { 

vcjaJd *SourceImagePointer, void *TlutPo inter, 

short Stride, 

short SublmageWidth, short SublmageHeight, 

char S cur ce ImageType, char SourcelmageBitSize, 

short ScaleX, short ScaleY, 

char FlipTextureX, char FlipTextureY, 

short SourcelmageOffsetS, short SourcelmageOf fsetT, 

short PScreenX, short PScreenY, 

char dummy [ 2 ] ; 

} uSprite_t; 



typedef union { 
uSprite_t s; 
long long int 

} uSprite; 



force_structure_allignment [4] 
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Where the parameters are defined as: '"^^i^Qp 

Source Image Pointer The address of the texture imag&S*lnemory out of 
which a subrectangle is to be displayed;. ■■;=<;■ . 

TlutPointer The address of ari;^ptiona| : ::p>te'lh^ex table for use with CI 
images. Use NULL for non-CI in»ij|&:&f$;f ''f;;| 

Stride The width in texels of the ongihS| : ::bas^;|mage in memory 

SublmageWidth The width m texels of the sub image which is to be 

displayed W ^Wify,.. 

SublmageHeight The height in texelffjf the subimage which is to be 

displayed -Mm^ 

Source Image Type The format of the texture image in memory. All 
supported h%$ware;^xtur£%rmats are allowed. 

SourceImageBitSiz.e,.T|ji|'huinber of bits per texels of the input image. 
AlL$||pported hardware" texture sizes are allowed. 

Jplil J|y ScaleY The &5.10 fixed point axis scaling ratios which are to be 
Spplied^p the input image. A value of 1024 specifies 1 to 1 scaling. A value 
of 512 specifies that each input texel should be scaled up to 2 output screen 
pixels. ScaM%alues should be <= 1024 in order to prevent sampling artifacts 
from occurmgf Scale values must be positive. Use the FlipTextureX or 
FlipTextureY parameters to create negatively scaled images. 

'#l|ipTextureX , FlipTextureY Specifies whether the image should be 
mir£6red in the X or Y direction before display 

Source ImageOf fsetS, SourcelmageOf fsetT The offset in texel rows 
"of columns from the origin of the input base image where the texture 
subrectangle which is to be displayed starts 

PScreenX , PScreenY Specifies the starting X or Y location in screen 
coordinates of the output image. The origin is in the upper left corner of the 
screen. 
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The guSprite2DInit ( ) call merely copies 'if%£ararheters '■■ into the passed 
in uSprite structure. The call can be eliminated if ffie application sets up the 
structure directly: 

The Sprite Microcode automatically handles the division of the input 
subrmage into 4K texture segments, loadf:ffiel§:jnto TMEM and issues the 
appropriate RDF commands to's'f tug;:aid rendlfk series of connected 
Texture Rectangles to display the suD$f|age at tfijp desired location and 
scaling. The Sprite Microcode keeps tr^ck : pi: ; thfe% and t coordinates for the 
generated texture subRectangles. '^ lB? ''' 

The Sprite Microcode cl^ps :;: }fie : :Coordrnates for the generated texture 
rectangles to prevent overflow of the RDP screen space registers. Texture 
Rectangles which have their X or Y starting values less than zero are clipped 
and their starting, s and t texture coordinates adjusted so that they begin at 
the screen boundary. Texture rectangles which have their ending Y value less 
than zerojffl their star^in^: Y value > 1023.75 are thrown away entirely 

More information about th|£Sprite Microcode can be found in the man pages 
for gsp Sprite 2D (3P) and guSprite2DInit (3P) 
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Chapter 19 

The Audio Library 



The Nintendo 64 Audio Library is a lightweight library of functions. It 
provides game .developers with the ability to interactively synthesize and 
manipulate^ uS^on the Nintendo 64. It provides support for both sampled 
sound playback ahlj'^ayetable synthesis. This is accomplished with four 
software objects:. the J: Soun% Player, the Sequence Player, the Synthesis 
Driver, anftlhe Audio Synilesis Microcode. These are shown in Figure 19-1, 
'Audio Software Architecture," on page 370. 

• The Sound Play ef lis' useful for the playback of single sample sound 
effects or streamed audio. It is capable of playing back either ADPCM 
Compressed sounds, or uncompressed 16 bit sound. 

• The Sequence Player can exist in either of two types. The first type 
playshack Type MIDI sequence files and the second type plays back a 
form!ff)i compressed MIDI unique to the Nintendo64. In both cases, 
the sequence player handles sequence, instrument bank, and 
synthesizer resource allocation, sequence interpretation, and MIDI 

|, message scheduling. 

I^fte: Both the Sequence Player and the Sound Player are clients of the 
, Synthesis Driver. The Driver can support an arbitrary number of clients, 
Ifftcluding multiple Sound and Sequence Players. 

• The Synthesis Driver is responsible for creating audio Command Lists, 
which are packaged into tasks by the Application program and passed 
on to the Audio Synthesis Microcode. It allows Driver clients to assign 
wave tables to synthesizer voices, and control the playback parameters. 



NU6-06-0030-001G of October 21 , 1996 369 



NINTENDO 64 PROGRAMMING MANUAL 



DRAFT 



• The Audio Synthesis Microcode processes feslasl^M&ssetS^o it by the 
application and synthesizes stereo 16- bit samfillpwhich tfi£ 
application in turn passes to the Audio DACs. ;.«i ;? 

This chapter contains descriptions gflfte '§a*ind Player, Sequence Player, and 
Synthesis Driver APIs . Many application r|rogf|i|uners will be satisfied 
with the interfaces provided by the Sound and Sequence Players, Most of the 
Synthesis Driver API is intended for' ji>rb^iammersj#ho want to create their 
own players (see the section titled "Writi^^XpHi.Jpwn Player" for more 
information); however, al}||r ; pgrammers should understand certain 
functions essential for the C-^efthon of audio Command Lists. 



Figure 19-1 Audio Software Architecture 
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The following sections outline the data structures and APTealls that are 
necessary to make use of the audio library. Further details. on some of the 
data structures can be found in Chapter 15, The data structure definitions 
and function prototypes for the calls described are in the include file 
libaudio.h, which is part of the software release. Also included as a part of the 
software release are reference (plan) pages for each of the function calls. 
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Generating Audio Output 



The basic process for generating, and playing audio can ^Summed up by 
the following steps. ^M;i$ : ^_ 

1. Create and initialize the necce|iary resource- ^.(Typically, an audio 
heap, a synthesizer, and a pla^r) ^0''' " i; |!l| 

2. Repeatedly make calls to alAudioFra|fte to gen&rate the audio task lists. 

3. Execute these audio tasks lists on the RSP. .. 

4. Set the output DAC's to poia| to the audio output, with a call to 

osAiSetNextBufferQ. Jf ' ? ^l|?c.:,. 

The creation and initialization of the nlclessary resources is somewhat 
dependent on ypur t ap plications needs, but typically you will need to take 
the f oUowmgpepsl'l;;:;;;: 

1. Create a||?audio : |pa^w|th : a call to alHeapInit. 

2. Set the hardware output jlequency with a call to osAiSetFrequen cy. 

3. Create a synmesizer^lth"' a call to allnit(). (allnit will require that you 
. have a callback routine to initialize the audio dma structures) 

4v: 1>: Create message queues for receiving signals that allow you to time your 
aucu%.processing. 

5. Creatljiiplayer, (such as a sound player or sequence player) to sign into 
the synS\|sizer. 

6. Initialize the resources specific to the player(s) that you have created. 
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Representing Sound 

The Audio Library supports r^^bacfe#l ;: fc' : 6^^ricorripressed and ADPCM 
compressed, 16-bit audio. An aiadifj^aveforTri:^ represented with the 
Sound object via the ALSound structure. ThejfB'Sound structure contains 
entries for the Envelope, Pan, and Vofulhe^ along with a pointer to the 
ALWaveTable structilli||which contains 'the audio). 

Collections of sounds can be stolid, in an ALBankFi 1 e structure. The format 
of this structure is described in Chapter 21, "Audio File Formats". The tools 
available to create Bank Files for inclusion in the ROM are described in 
Chapter 20, : ;''AMio Tools". 

Note: Ci||rentl)V:®t'^htp j: supported sample formats are single-channel, 
ADPCM cpmprelied ancl|l6-bit uncompressed. 



J||aying Sounds ' lrafJ ** 

The- Sound Player is the mechanism by which the Audio Library plays back 
individual sounds, such as isolated sound effects. It is responsible for 
allocatihg the resources needed to play a sound and for controlling the 
performance of the sound data for the application. 

There are certain steps you must take for your game to play a sound. At a 
minimum, you must: 

':;i:i; Create and initialize the basic resources described in the section 
§g Generating Audio Output. 

ft. Instantiate the Sound Player with alSndpNewQ. The Sound Player 
created also signs in as a client to the Synthesis Driver. 

3. Copy the sound bank's xtl file into RAM, and initialize it with a call to 

alBnkfNew. 

4. Allocate a sound with a call to alSndpAllocateQ. 

5. Set the Sound Player's target sound to reference your sound with 
alSndpSetSound () . 
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6. Play the sound with alSndpPlay(). ■||| 

7. Stop the sound when you are finished with aJShSpStppQjpote that if 
the sound is not looped, the sound player will take ctlfiif stopping the 
sound when it is finished playing'if ^vever, you can stop the sound at 
any time during playback witWihis call: e 

When the sound is no longer needed, the resources 'in the Sound Player can 
be freed with a call to alSndpDeallocate(||Jf the Sqind Player itself is no 
longer required, it can be removed from tSiSjfjthesis Driver client list with 
alSndpDelete(). 

The Sound Player can plapDotfi'^^ed and unlooped sounds. When 
playing a sound, the Sound Player stip&rpjrough the Envelope states Attack, 
Decay, and Release. Envelope parameters are defined in the ALSound 
structure. The djtrgjion of the sound is determined by the sum of the Attack 
time, Decay time, anil-Release time, or the length of the wave table 
(whichever §|shorter|i;p|l|^ by the pitch. 

For looped sounds, the durajffki is always determined by the Envelope 
parameters and the pitch.If Me Envelope Decay time is set to -1, the sound 
wilLeonhnue playing ttnatis, it will never enter the Release phase) until it is 
stSppUd by the application with a call to alSndpStop(). Envelope times are 
Jcaledfey the playback pitch so that regardless of pitch, finite-length sounds 
pplay to completion. For example, by default, a sound played an octave lower 
plays for twice as long as it does at unity pitch. Loop points for sounds are 
embedded' 't^the ALWave Table structure. (Loop points will be 
automaticallylixtracted from the .aiff file when using the file conversion 
tools provided.) 

Various parameters that affect the playback of a sound can be set before and 
during playback. When a sound is allocated to a Sound Player, an ID is 
rehired that uniquely identifies that sound. Parameters for a particular 
?! ^|pld are set by first setting the target sound with a call to 
^a5SndpSetSound(), and then making a subsequent call to set a parameter for 
the target sound. Available calls are detailed in Table 13-1. 

Note: Each sound allocated to a Sound Player has a unique ID and private 

parameter values and play state. To play the same sound simultaneously, 
possibly with different parameter settings, it must be allocated multiple 
times to the Sound Player. 
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A summary of Sound Player functions is given below. Details can be found 
in the reference (man) pages. 

Table 19-1 Sound Player Functions ,,-,^. 

Function Description 



alSndpNew 
alSndpDelete 

alSndpAllocate 
alSndp Deallocate 

alSndpSetSoujjd^ 

alSndpGetSotind 

alSndpPlay '•* 
aJSjrtdpPlayAt 

■alSrldpStop 

alSndpGetStates 

alSndpSetPitch 
alSndpSetVol 

aJfmdpSetPan 

fttlSndpSetPriority 
alSndpSetFXMix 



Creates a hew Sound Player. 

■Removes„#:Sound Player from the 
Syjrp^is/Driver's client list. 

Allocate a sound to a sound player. 

Deallocate a sound from the sound 
Ip layer. 

Sets the Sound Player's current sound. 

Returns the Sound Player's current 
sound. 

Plays the Sound Player's current sound. 

Plays a sound at some specified time in 
the future. 

Stops the current sound from playing. 

Gets the current state (stopped or 
playing) of the current sound. 

Sets the pitch for the current sound. 

Sets the playback volume of the current 
sound. 

Sets the pan position of the current 
sound. 

Sets the sounds priority value. 

Sets the wet/dry mix of the current 

sound. 
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Sequenced Sound Playback 



You will be concerned with three issues when using sequenced sound on the 
Nintend o 64 : 

• representing the sequence daf§| 

• representing the instruments on's%u%is that r4$jjce up the sequence 

• controlling the sequence playback 



Representing the Sequence 

The Audio Library supports two different sequence players. The first 
sequence pi ay erases Type MIDI sequences. Sequences are represented at 
runtime with.tne : Al| ; eq structure. This structure encapsulates sequence 
data that coftprrns to^gftandard MIDI Files 1.0 specific anon for Type 
MIDI files. TMe ; Typel& : Mlb1:;fte format contains a time-ordered MIDI 
message that specifies music events. It is described in detail in the "Standard 
MIDI Files 1.0" specification^ublished by the MIDI manufacturers 
association. 

, ; |p2 second sequence player uses a compressed format of sequence data 
fpnique tEthe Nintend o64. This format is detailed in Audio Formats chapter. 
' Sequencefeare represented at runtime with the ALCSeq structure. Besides 
difterencel|§|the format of the data, the compressed MIDI sequence player 
handles loopsjfh a different fashion and does not support markers. 

To use a Type MIDI sequence in your game, you must first initialize an 
Pl&geq structure with alSeqNewQ. To use the compressed MIDI sequence 
pfl§f|r, y° u first initialize an ALCSeq structure with alCSeqNewQ. After 
mitfitizing the ALSeq structure, you can perform sequence operations. 

life aISeqNextEvent() call returns the MIDI event at a specified location in 
the sequence. The alSeqNewMarkerQ call creates a sequence position 
marker that can be used in conjunction with the Type Sequence Player to 
set playback time and loop points. The convenience functions 
alSeqTicksToSecQ and alSeqSecToTicksQ convert between seconds and MIDI 
clock ticks. 
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Note: Normally, you won't call alSeqNextEve#t() r directly, because it is 
called by the Sequence Player during sequence playback. 

The sequence calls are described m detail in the reference (man) pages. Brief 
descriptions are given in Table 13-E : 



Table 19-2Sequence Functions 

Type MIDI CompreMad MIDI 

Sequence Piayer Sequence Piayer 
Function Function 



Description 



alSeqNew 



alCSeqNew 



alSeqNextEv ent ' aJOSeqNextEvent 



alSeqNewMlfler alCSeqNelyMarker 



alSeqGetLoc alSqtSetLoc 



lSeq'ilt.Loc alCSeqSetLoc 



alSeqTickli||§ec alCSeqTicksToSec 



alSeqSecToTicks alCSeqSecToTicks 



lipiitializes the sequence control 
structure. 

Returns the next MIDI event from the 
sequence. 

Initializes a marker for a given event 
time. 

Sets a marker to the sequence's current 
location. 

Sets the sequence to the location 
specified by the marker. 

Converts a time value from MIDI clock 
ticks to microseconds. 

Converts a time value from 
microseconds to MIDI clock ticks. 



,|pp resenting Instruments 

Instruments are represented at runtime by the ALBankFile structure. This 
structure describes the instruments that sound in response to an event in the 
sequence. Bank Files are composed of Banks; which are composed of 
Instruments; which themselves are composed of groups of Sounds / 
KeyMaps, Envelopes, and gain and pan information. The Bank File format 
is described in detail in the Audio Formats chapter. 
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To use a Bank File in your game, you must first'^fgate a i^timk^tructure to 
represent it. This is accomplished with the dBnkfNiw|f functioti (See Table 
13-3). Both sequence players use the same function call for;:t^- operation. 



Table 19-3Bank Functions 



Type MIDI 
Sequence Player 
Function 



Compressed MIDI! 
Sequence Player 
Function 



Description- 



alBnkfNew 



alBnkfNew 



Imhaliz^%>£$liechon of banks for use 
with a Sequence Player. 



Playing Sequences 

The Sequence :;Pfeyel;§s the mechanism by which the Nintendo 64 Audio 
Library playback Mftol^guence files. It is responsible for allocating the 
hardware ah^sottw^ resS|i|ces needed to play a sequence and for 
controlling tHSperformance |§fhe sequence data for the application. 

Not^feA Sequence Playeitplfft play only one sequence at a time. 

jj^re %e certain steps you must take for your game to play a music 
'■ sequence. The minimum steps needed to use the Type MIDI sequence 
; player are listed below. Using the compressed MIDI sequence player is 

identical, only you use the calls specific to the compressed MIDI sequence 

player. l||h 

1. Create and initialize the basic resources described in the section 
•^ , Generating Audio Output. 

2: Initialize the sequence by using alSeqNew(). 

3. Jpbpy the bank file's .ctl file into RAM, and initialize the bank by using 
.; alBnkfNew (). 

4. Initialize the sequence player by using alSeqpNew(). 

5. Set the sequence player's bank by using alSeqpSetBank(). 

6. Set the sequence player's target sequence by using alSeqpSetSeqQ. 

7. Play the sequence by using alSeqpPlay(). 

8. Stop the sequence when you are finished with it, by using alSeqpStopQ. 
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9. If the sequence player is no longer needei^c^jSe removed from the 
Synthesis Driver's client list by using alSeojllfeiete(). 



Table 1 9^Se quence Play er F unc tiohs 

Type MIDI Sequence Compressed MIDI Description 

Player Function Sequence Player 

Function %^m;<0W- 



alSeqpNew 
alSeqpDelete 

alSeqpGetState. : .. 

alSeqpSetfenk 



al||Sj?New 
alCSPDeiete . 

alCSPGetState 

il£SPSetBank 



alSeqpGetSequence alGjlPGetSequence 



: 'alSei||?SetSequence alCSPSetSequence 



alSeqpPIay. 
alSeqpStop' ! 



alCSPPIay 
alCSPStop 



|alSeqpGetTempo alCSPGetlempo 



aJpeqpSetTempo 



alSeqpGetVol 



alSeqpSetVol 



alCSPSetTempo 



alCSPGetVol 



alCSPSetVol 



alSeqpGerChlPan alCSPGetChlPan 



Initializes a Sequence Player. 

Removes a Sequence Player from 
the Synthesis Driver's client list. 

Returns the current state of the 
Sequence Player. 

Assigns a bank of instruments to 
the sequence. 

Gets a reference to the sequence 
that is currently bound to the 
Sequence Player. 

Makes the specified sequence the 
target sequence. 

Starts the target sequence playing. 

Stops the target sequence if it is 
playing. 

Returns the current playback 
tempo for the target sequence. 

Sets the current playback tempo of 
the target sequence. 

Returns the overall volume for the 
sequence. 

Sets the overall volume for the 
sequence. 

Gets the pan on the specified MIDI 
channel. 
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Table 19-4Sequence Player Functions 



Type MIDI Sequence Compressed MIDI 
Player Function Sequence Player 

Function .^ 

alSeqpSetChlPan alCSPSetChlPan 



alSeqpGetChlVol alCSPGetChiVcf 



alSeqpSerChlYbl alCSfjj&jChlVol 



alSeqpGetChlProgram alCSPGetChlProgtam 



alSeqpSetChlProgram . a lCSPSetCh [Program 



alSeqpGeChlf^ix .y|fCCSPi| : fcCh]FXMix 



alSeqpSetChlFXMix , alCSPSetChlFXMix 



Jpqpi§|tChlPriority alCSPGetChlPnority 



alSeqpSetChi.Pnority alCSPSetChiPriority 



Description 



alSeqpLoop 



(Not Supported) 



ifleopSendMidi alCSPSendMidi 



jSeJ^Ihe pan for the specified MIDI 

IcnaWjik 

Gets the volume for the specified 
r .MIE|||fannel. 

,-Sets the volume for the specified 
MIDI channel. 

Returns the program assigned to 
the specified MIDI channel. 

Assigns the given program to the 
specified MIDI channel. 

Gets the wet/dry FX mix on the 

specified MIDI channel, 

Sets the wet/dry FX mix on the 

specified MIDI channel. 

Gets the priority value for the 
specified MIDI channel. 

Sets the priority value for the 
specified MIDI channel. 

Sets the loop points for the target 
sequence. 

Sends the specified MIDI message 
to the sequence player. 



Tbops in Sequence Players 

The way in which loops are handled in the sequence players is different. 
When using the Type MIDI sequence player, the programmer must create 
a marker at the loop start point, and a marker at the loop end point. Then the 
sequence can be looped between these two markers using alSeqpLoop(). 
Using the compressed MIDI sequence player, loops are constructed by the 
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musician, in the tracks of the sequence by inserting controllers. (This is 
discussed in the chapter "Using the Audio Tools"}. This method allows 
different loops for different tracks, and allows for neslrhg- :: -of loops. 



Controllers in Sequence Players 

The realtime controllers that the Sequence Player responds to are (control 
numbers in parenthesis): pan (10), voltiri|g,0;);l;|>riority (16), sustain (64), and 
reverb amount (91). Note that because only one effects bus is supported, 
reverb amount is used to control effect amount no matter what the effect is. 

The compact sequence player also uses controllers 102, 103, 104, and 105 for 
creating loops. Details of this are discussed in the chapter "Using the Audio 
Tools/' ,,.«„,, 



NU6-06-0030-001 G of October 21 , 1 996 381 



NINTENDO 64 PROGRAMMING MANUAL DRAFT 



The Synthesis Driver 



The Synthesis Driver is the Audio Library object used by^iPiSund Player, 
the Sequence Player, and appHcation-sgecific players to create Audio 
Command Lists, which are passed to the Audio Microcode. This section 
defines various API calls which c jjjbe useii^afljlication programmers 
who want to create their own Play^^flff 

Programmers who use the Sequence Player and Sound player need only be 
familiar with the inihalizalion of the driver, me^alAudioFrarneQ function 
that creates audio Command Lists, and the mechanism by which the 
Synthesis Driver satisfies tje* nlil||©r sound data. 



Initializing trie Driver 

The Synmes|||driver ,fie^clijtO be initialized in order to be used. This is 
accomplisheljby calling alSy|fNlew() with a configuration structure that 
specifies the number of virtual: voices, physical voices, and effects busses to 
instantiate. The configuration structure also provides information regarding 
thejpgidio DMA caUblaBcrbutines, the Audio Heap, FXType and the audio 
playback rate to use. (Audio DMA callbacks are discussed later in this 
: jSapteB|. 

Note: ThiallnitO call will call alSynNew(). 

The configuration also specifies a callback procedure pointer of type 
ALDMANew , which is used by the synthesis driver initialization procedure to 

a$gt up callbacks for sound data requests. The procedure specified in the 

r? i|^guration structure is called once during initialization for every physical 
voi||ithat is instantiated. The Synthesis Driver expects the procedure to 
retulfi another procedure pointer that defines a callback of type 

:; , AgptAproc , and a pointer to some state information that can be used in 

Various ways to manage sound data requests. 

Note: Only one driver may be instantiated at any given time. 
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Building and Executing Command Lists 

The main function of the Synthesis Driver is to build Audio Command Lists, 
which are executed by the microcodMo synthesize audio. Command lists 
are built in frames. A frame is a number of samples — usually something 
close to the number of sample^quirei?p : :li| : g : complete video frame tune 
at the regular video frame rate|#g. ; Mpr 60 Hzf | 

From an application, the Command L|S;t-go,s|l|mesize a number of audio 
samples) is built by making a call to alAudioFrame(). Parameters for this call 
define the number of sa||||e.s, (which must be a multiple of 16), a physical 
address of an output btifler vM|re the Microcode will put the audio samples, 
and a pointer to an array that carklly^sed to store the Command List. 

During the contraction of the Command List, the Synthesis Driver makes 
callbacks twits' cl%ts (the players) to process the various events that 
deteiminfethe pai^niefers and timing of the playback of sound effects and 
sequence!!! ., -I " ? -l||. 

The Driver also majces.qajlacks to the defined ALDMAproc routine with 
if guests for sound data (see below). 

■ To execute an audio Command List, it is first putinOSTask structure and 
J:r then passed to the microcode with a call to osSpTaskStart(). The OSTask 
stractfje specifies pointers to microcode and data along with the Command 
List which allows the RCP to execute. 

Synthesis Driver Sound Data Callbacks 

If f|)e application is responsible for making sure that the required sound data 

i||ocated in RAM before the command list is executed by the audio 
^ ;3 ^||^ crococ ^ e - ^ e application programmer has the freedom to load complete 
compressed sounds from the ROM before playback, or, as is more likely, to 

t : i#f &1 initiate DMAs from ROM to RAM in response to callbacks from the 

'^$:':;,fc. Synthesis Driver. Initiating DMA's in response to callbacks allows the 

^%:lfe.. application to only load the portion of the sound needed, and thus greatly 

°"""W reduce the RAM needed for audio. 

The Audio DMA callback routines are initialized when allnit is called. The 

W - ttfes,, synthesizer configuration structure must contain a pointer to a routine for 
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initializing the Audio DMA's. This routine will^ftcallfiti fence lor each 
physical voice. Typically this routine will initiahze'an-^' state variables, and 
then must return a pointer to the ALDMAproc. 

The ALDMAproc procedure is called : &y :; e;a£h physical voice during the 
construction of the command list fifien corri|3ti|s^d sound data is required. 
The call specifies the required data-';|$dress£the lehglh, and the state pointer, 
and it expects to receive a physical memory addres|;; where the data can be 
(or at least will be) found in memory. 

The example applications |||||:s.eq, and simple) provide examples of how 
these callback routines can be implemented. 



Assigning Players to the Driver 

In order to rrf&fce cdls/tia^fe^djiver interface, you must first make your player 
known to thfi^riveEifPus is :? ^|omplished with the alSynAddPlayer() call. 
For more information on wrifpig your own player, see the section "Writing 
Your Own Player". n:v ^™ ; 

Note:. Both the Sequence Flayer and the Sound Player add themselves to the 
dfivef %hen they are initialized by calling alSynAddPlayer(). If you are not 
ipreating^our own players you should not need to call alSynAddPlayer. 



Allocating^fid Controlling Voices 

..The Synthesis driver manages two types of voices: virtual voices and 
physical voices. 

Virtftll voices are described by the ALVbice structure, and represent the 
yojp from the player's perspective. In order to play a wavetable, players 
mtlst allocate a virtual voice on which to play it. This is accomplished with 
the alSynAllocVoiceQ call. The voice configuration structure allows you to . 
specify the voice priority and bus assignment. The number of virtual voices 
available is established when the driver is initialized, and you may specify 
more virtual voices than you have resources to play. There is no benefit to 
specifying more physical voices than virtual voices since the player will 
have no way to use them. 
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Physical voices represent the actual sound pr^e.e ; 5sij% modules available to 
the driver. They consist of an ADPCM decompressor, a pit|n shifter, and a 
gain unit. The ADPCM decompressor converts mono-'A^PCM compressed 
(approximately 4:1) wavetables to mono 16-bit raw format. The pitch shifter 
resamples the resulting data (up one octave, down any number of octaves) 
to the desired pitch. The gami§|it men^pphl^a volume envelope, a pan 
value, and mixes the (stereo) offipu^igtb the mister bus and an effect bus at 
gains specified by the wet/dry plfari|ters asdipiated with the voice. 

The driver maps virtual voices to physical voices based on virtual voice 
priority If there are more active virtual voices than available physical voices, 
the driver allocates thephysicaT:yoices to the highest priority virtual voices. 
The driver may "steal" a physicall||pce from a virtual voice if a higher 
priority virtual voice is allocated. "*■' 

Note: To f p|ev'ehl:^ voice from being stolen, you can set the voice priority to 
the rugh|!f: priori||:;:#|tih;alSyriSetPriority(). 

After you allocate a virtual: voice, you can use it to play a wavetable with the 
alSynStartVoiceQ call.. Yol can stop the plavback with the alSvnStopVoice() 
JgJL *■ 

■'Oit|| you start a voice, you can control pitch, volume, and panning and 
effeelmix with the appropriate calls listed in the section titled "Summary of 
Driver; Functions". 



Effects and Effect Busses 

Igach voice can be assigned to one effects bus. Each effects bus can contain 
■|||y number of effects units (up to the limit imposed by the processing 
Jlfsources). The number of busses and effects units are specified in the driver 
Configuration structure and are established at initialization time. 

Note: The Audio Library currently only supports one effects bus. Future 
version may support multiple busses. 
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Creating YourOwn Effects 

The Nintendo64 uses a general purpose effects implemei$|fl&fi' that 
manipulates data in a single delay ligig||^;§mall number of default 
configurations have been supplied. |&ee hbiudio.h), but apphcations 
developers can also specify there o^vn cust#rh>ii&rb and chorus /flange 
style effects. 'f|| 

The way m which the data is manipulate d^-is: defied by a set of parameters 
specihed m blocks where each block represefllS-i'smgle effects primihve. An 
effect is constructed by atta$|t||ig~an arbntrary number of effects primitives 
to a single delay line. Therejfs ;: ohe" : ^teLpnly one input to this delay line which 
is the sum (slightly attenuated to mmifxiize overflow) of the left and right 
effects send busses. The contribution of a voice to this bus can be specified 
by a call to alSyr^i.|FO<Mix. This delay line is then operated on by the effect 
specihed in tl^me^fkType held of the synthesizer configuration structure. 
The delay merhory will -be allocated from the audio heap by a call to 
a Unit, so fife application ri^st be sure that the audio heap is big enough 
to contain thelfelay memory ,|fid it's associated effects primitive stuctures. 
The parameters for each prjjffifive in the effect are specified in an array 
whigj|js passed to the atilfi&mitialization code. Each primitive consists of an 
inp;i|t : Offset, an output offset, coefficients specifying output contribution to 
igfjut aid input contribution to output, chorus rate and depth parameters 
Iphich control modulation of the output offset, a DC normalized (unity gain 
■at DC) single pole low-pass filter, and finally, an output gain specifying how 
much of th1|§|rrmitives output is to be contributed to the final effect output. 

The particular combination of values in each of the parameters for a 
.primitive specifies the function of that primitive as a whole within the effect. 
i:|(|r example, if the ffcoef and fbcoef are the same except for a sign change, 
malprimitive will be an all pass; if ffcoef and fbcoef are different, or one or 
the (Slier is zero, the primitive will be a filter of some kind. If both ffcoef and 
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fbcoef are zero, the primitive will be pure de^ay-.qnlp possibly modulated 
and low pass filtered. 

Figure 19-2 Effects Primitives *&&.*''■ 



licoef 



gain 




or alternatively, 



contiguous delay line 




this store does not 
occur if a tap position 
modulation (chorus) 
is part of the effect 

(see chorus rate and 
chorus depth 
parameters) 



f |ihe function of the effects primitives can be thought of in two ways, the first 
J|f which is as an individual signal processing block. The effect as a whole 
pvould then be thought of as a set of concatenated and /or nested primitives 
arranged to produce the overall desired effect. The second way of 
conceptualizing the primitive is the way it is actually implemented, which is 
to say, as an operator on a single longer delay line shared with all the other 
primitives. Both conceptualizations are illustrated in figure 13-2. By careful 
selection of the effects parameters, a large class of cascaded/nested all-pass 
and comb filter based effects can be created. (For a more detailed description 
of this class of effects, see Bill Gardner's MTT masters thesis, "The Virtual 
Acoustic Room", section 4.6, available from 
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http://sound.media.imt.edu/papers.html, an ci^|i|s..Macih tosh ^Reverb" 
program and documentation m same location). ~'^-'l\& 

Builders of custom effects will also disc^er that the effect specification 
controls not only the nature of the effect., but the processing resources 
consumed by the effect. Only thosf|iinctions which are driven by non-zero 
parameters actually generate any audio command operations in the RCP. 
This gives application developers a great degree oijfexibility m defining an 
effect that is appropriate both in terms otpbnic guilty and efficiency If a 
developer wishes to use o&e :i of the pre-denn%^e°lfects, they need only 
specify that effect in the fxfjj^lield of the synthesizer configuration 
structure. If, on the other hand, th ey wish to build their own effect, they 
would specify an fxType of AL_FX5^JSTOM, and then allocate and fill in 
the fields for the primitives. See the PKMpps/playseq source for one 
example of how to use this capability to build a complex effect. 

lb create a custom eff^t.an,application specifies the number of sections, the 
overall lengtffof the,i||i:ay 'riemory used by the total effect, and then the 
input and output addresses, feedforward and feedback coefficients, gain, 
chorus rate and depth, and low-pass coefficient for each section. Following 
is a brief explanation of the significance of each parameter and what 
processing actually takes place as a result of it's inclusion. Although 
pararr!||ers are interpreted in different ways, they are all stored in signed 
• 32-bit numbers. 



Parameter Description 

The following two parameters are specified only once for the entire effect: 

se&yws: this parameter specifies the total number of sections in the effect. A 
seci|§|h is one primitive and it's associated parameters. 

|Pf 'th: this parameter specifies the total length of delay memory used by the 
effect, and must be a multiple of 8 bytes. Since data is processed in blocks, 
this parameter should be greater than or equal to the largest output offset 
parameter PLUS the length of a processing buffer. This length is defined to 
be 160 samples, or 320 bytes. If the last section of the effect has a non-zero 
chorus rate parameter which corresponds to a slow modulation rate, and a 
deep modulation depth (> 1 semitone), the total delay length may need to be 
larger depending on the rate and depth of the chorus. 
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The rest of these parameters constitute one ^|pcessmgielement, so there 
must be one set of these parameters for each seeMoh: : Specifid;f}by the sections 
p ar ame te r. MwM :> 

The following two address parameters.; must be positive and must be on 8 
bytes (or 4 sample) bound ariesJjfTne apjj^cahon playseq.c shows an easy 
way to specify addresses m th€ Convenient uhifcpf milliseconds which are 
properly aligned. ''' x m0W 

input: this parameter specifies the address :: ip;ffie input of this section or the 
effect. This address muf^pn a 4 sample (or 8 byte) boundary 

output this parameter specifle^iEfe^d dress of the output of this section of the 
effect. This address must be on a 4 sample (or 8 byte) boundary 

The foUowihg:<t|ii@e parameters, along with the lpf.lt coef parameter, are 
interpreted as sigrMd^-bit fractional fixed point values. The upper sixteen 
bits should be sigj-i extended: 

fbcoef, this parameter specifies the coefficient of the feedback portion of the 
section. If this parameter^ is zero, no action takes place. 

W c ^Sr ^ s P arameter specifies the coefficient of the feedforward portion of 
' the fiction. If this parameter is zero, no action takes place. If the chorus rate 

parameter is non-zero, because it is not possible to store the loaded output 
back infethe delay line since it is not the same length), the jfcoef parameter 
controls How much of the input to add to the interpolated output allowing 
flange typi%ffects. 

k gain: this parameter specifies how much of this primitives output to 
fl^ntribute to the total effect output, and can be thought of as a 'tap' value. If 
|gro, no multiply is performed. Note that at least one section of the effect 
Jllust have a non-zero gain value for the effect to be heard. If no section of an 
fHifect has a non-zero gain value, then no effect output will be heard. 

chorus rate: this parameter specifies the modulation frequency of the output 
tap position of the delay line, i.e., how quickly the tap position will be 
modulated. The value of this parameter is (frequency/sample rate)*2 A 25. 
For example, a modulation frequency of .5Hz at a synthesizer sample rate of 
44.1kHz would be (.5/44100)*33,554,432 - 380 
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chorus depth: this parameter specifies the moduIat|^g,dej?iH/ or pitch change, 
of the effect. The parameter is specified approximately in hundredths of a 
cent. So a modulation depth of +/-25 cents, or a quarter of Jfepitone, would 
be 2500. The approximation to cents is,; gQod over the range useful for 
musical chorusing and flanging, i.e., less than a few semitones. The error at 
1 semitone (100 cents) is about 3 cfeftts and ^:#S^initones is about 30 cents. 
If you wish to know the "exact" v!toe.(m:eents) of;|he modulation depth , 
use the following equation: 



cents .HP 



1200 . ( _ chorusdepth 



|ln(2f 



120, 000/ln(2) 



Ipfilt coef: thiSvf aram§%r ;:; specifies the single pole low-pass filter coefficient. 
The derivation of thif:ValUi : :-l! ; a function of frequency and sample rate can 
be found m ntifnerous signal 'jsjjbcessing texts, and is left as an exercise to the 
reader (doncha hate that). Qjfjferate a table once and forget about it. Only 
positive values will actuappbe low-pass. Negative values will generate DC 
normalized boost at high frequencies causing possible overflow. 

ellfmed'With this knowledge about primitive parameters, let's look at some 
example effects: 

Figure 19-3 A simple echo effect 



179 ms 



^ 





.36 
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The effect in figure 13-3, which is a simple e^|),effe^|i: : and can be selected 
using AL_FX_ECHO, would be implemented WM|fthe folgjiwing 
parameters: 4 

#define ms * ( [ (s32 ) ( (f 3 2 ) 44 . 1) ) &~0x7 ) 

param[0] = 1; /*the numbej|fof se||^fl|. ; xn this effect */ 

paraitiEl] = 2 00 ms ; /* toc|| fo al]Jp:tea: :; ' :: |R|mory */ 

param[2] =0; /* input is '"rje/fffiiiing ofjielay line */ 

param[3] = 179 ms ; /* oucput legation ,°$|i delay line */ 

param[4] = 12000; /,* fbcoef of Wmg0' J 

param[5] =0; /* n#Jf^ed forward coefficient */ 

param[6] = 0x7 fff; /* full gain 1.0 - 1/2^15 */ 

param[7] = 0; /* ne^chor^^rate */ 

parani[8] = 0; /* no chorus^epth */ 

param[9] = 0; /* no low-pass 'filter */ 

This is, m fapfteecho effect implemented when AL_FX_ECHO is specified 
in the fxType field|;^||if .synthesizer configuration structure. 

Let's try something a littl|jfnore interesting: 
Figure 19-4 A nested arj#£>ass inside a comb effect 



section 1 
input = 0ms 




section 2 








fbcoef = 9830 
ffcoef = -9830 
eain = 

chorus rate = 
;chorus depth = 
iSpass coef = 







input = 19 ms 
output = 38 ms 

fbcoef = 3276 

ffcoef = -3276 

gain = 0x3fff (.5) 

chorus rate = 

chorus depth = . 

lopass coef = 


r v 




H^Vl 






t 


L 






section 3 

input = 

output = 60ms 

fbcoef = 5000 

ffcoef = 

gain = 

chorus rate = 

chorus depth = 

lopass coef = 0x5000 (.625 




} 


r 





In Fig 13-4, we have used the more compact Gardner-style notation. Note 
that section 2 is "nested" inside section l.This effect which is the 
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AL_FX_SMALLROOM effect, would be specific using:;]ffe following 
parameters: 



param[0] = 3; /*the number of sections in this" i:; e ; f£ect */ 
par am [1] = 100 ms; /* total a|:|bbated memory */ 
1 */ jjf 

0; /* input */ |S JP^^'^'llk 

54ms; / * output *$■% * 

9 830; /* fbcoef */'""' :S ' T '"'|| 

-9830; / * ffcoef * / <||§-,..„ ,. ,«lf 



/* SECTION 
param[2] = 
param[3] = 
param[4] = 
param[5] = 
param[6] = 
parara[7] = 
param[8] = 
param[9] = 
/* SECTION 
param[10] 
param[ll] 
param[ 12 ] 
param[13] 
param[14] 
param[15 j 
param[16] 
parg|ti[17] 



/ 



/* no s||;t gain */ 

/* no cl|pj|,s race 

/* no cK;|ris::;:;€e_l ay */ 

/* no lbw-pass^fe:lter 










2-/ 
= 19 ms; /* input 
= ,j|l|*ros : ; /" output 

0$'2T$i&./* fbcoef 
f -327§';.. ■'/*■ ffcoef 
H|0x3filT /* : - i:: l|in 



7 



7 



7 



= ^0 
= 
= 



7 



choruiprate 
chori|p depth */ 

Lib^i>ass filter 



/ *g!|CTION 3*/ 
|p'aml|l8] =0; /* 

pkram[; : 19] = 6 0ms; 
iparam[2 0,l = 5000; 

par am [ 2 111; = 

parara[ 2 2 ]%& 

param[23] =^ 

param[24] = 



input */ 
/* output * 
/* fbcoef * 
ffcoef */ 
gain */ 
chorus rate 



7 



/* chorus depth */ 



>aram[25] = 0x5000; /* low-pass filter */ 
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Summary of Driver Functions 

Table 19-5Synthesizer Functions 



Function 


Description 


alSynNow 


O^ens.JSrlgii.initializes the synthesizer 

' 1 :;5;:K;-. v - ;; ;d;i'iver. ";'|% 


alSynDelete 


NOT IMPLEMENTED 


alSynAddPlayer 


Adds :: a Client player to the synthesizer. 


alSynRemovePlayer 


Removes a player from the synthesizer. 


alSynAlloc Voice 


•'! /-Allocates and returns a synthesizer 
''" : voice. 


alSynFree Voice 


Deallocates a synthesizer voice. 


a IS ynS tar fcV#i ce 


Starts a virtual voice playing. 


alSynStartVoiceParams 


Starts a virtual voice with the specified 
parameters. 


^ajli^nStop Voice 


Stops a virtual voice from playing. 


SsffiSetVol 


Sets the volume for the specified voice. 


alSyn§|tPitch 


Sets the pitch for the specified voice. 


alSynSetMk 


Sets the pan values for the specified 
voice. 


alSynSetFXMix 


Sets the wet/dry/effects/mix for the 
specified voice. 


t|||ynSetPriority 


Sets the priority of the specified virtual 
voice. 


|SSynGetPriority 


Returns the priority of the specified 
virtual voice. 


alSynAIlocFx 


Allocates a new effect of the specified 
type to the specified bus. 


alSyrtFreeFx 


NOT IMPLEMENTED 
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Table 1 9-5Synthesizer Functions 



Writing Your Own Player 



Function Description 



alSynGetFXRef ■, /M&itogns a pointer to the FX structure. 

aiSynSetFXParam Iff Curn|hjf ftagno effect. 



A Player is an Audio Library software^ ect that works through the 
Synthesis Driver to construct audio command lists. Both the Sequence 
Player and the|So^ifi:d Player are examples of Players. 

A Player orjejptes b^:§iprm|ginto the driver and then responding to driver 
callback witff privefAPI cal|f§ described in the section "The Synthesis 
Driver" on page 382. The initialization procedure and the callback routine 
are detailed below. 4i>^P r 



Iriitialtzing the Player 

In order f8||[our player to receive driver callbacks and to use the synthesis 
driver voic&ifkictions, you must first add the player as a driver client. This 
is accomplished with the alSynAddPlayerQ call, which takes two 
arguments: a reference to the synthesis driver, and a reference to the 
j|y?layer structure that represents the player to be added. A reference to the 
sfiihesis driver may be obtained from the Audio Library globals structure 
alGlobals. The ALPlayer structure contains a reference to the voice handler 
cafipack function and a pointer that the player can use. 

Example 19-1 Player Initialization 
typedef struct MyPlayer_s { 

ALPlayer node; 

/* 

* include other player specific state here 
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} My P 1 ay e r ; . ,;| ... f -. > ■_ ;»;; 

void player New (My Player *p) .^rl-^t--,-,. 

/* If \m 

* Initialize any player.. spedl fie "'ifate here 
*/ " s 1k .'; 

* Sign into thigayn thesis driver so that the next time 

* alAudicFrame Jl feudal led, it will call the 

* voiceHandler f uh^-ti-on . 

*/ ^'S :: :.b r :.. 

p->node.next = NULL;" 1 -' 

p->node_%feandler = voiceHandler ; 

p->no^e"^c , i|$gntData = p; 
alSyjji\ddPlayer;->|r%alGlobais->drvr, &p->node) ; 



void playerDelece (MyPiayer *p) 

* remove this player from the synthesis driver 

alSynRemove Player (&alGlobals->drvr, &p->node) ; 
} 

In the pre'lpus example, you'll notice that the player structure contains a 

reference to voiceHandler. This field points to a callback procedure, of 

type ALVoiceHandler, which the driver calls in the process of building the 
laudio command list. 



Ipiplementing a Voice Handler 

When your application calls alAudioFrameQ, the driver iterates through its 
list of players, calling the player 's voice handler functions at the appropriate 
offset (which translates to time) in the command list. 

Typically, the player maintains a time-based list of events which the voice 
handler parses and translates into driver calls. The voice handler contributes 
to the construction of the command list by making driver voice calls. 
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Note: Driver voice calls can be made only from within, the voice handler 

functi on . ' " ]i -HM:fS - :; 

The voice handler returns the tune, in microseconds, for the next callback. 

Example 19-2 The Voice Handler;;^" 

AliMicroTime v oiceHandler( f )|id ; , . <$ $0 d e ) 

{ '"' mm ii, Jit 

My Player *p = (MyP layer * )f|i>.de ; ..^fjF 

I * ft% 

* You can now maj^'^lljs to the following synthesis 

* driver voice ffiinctio^S;; 

* alSynAllocVoiceOf 

* -m^w:3 1 SynFr eeVo i c e ( ) 

* '' Vi -#1 SynS tartVo i c e ( ) 

* ,|f af S|§!StLpp Vo i c e ( ) 

* M-, jigyriSl|yol () 

* 'W al SynS ep i t c h ( ) 

* *! Synjet Pan O 

,,, * alSynptFXMix() 

: ))::•* a lSynSet Priority { ) 

':f* alSynGecPriorityO 

*?„. alSynSetFXParamO 



4000; /* call back in 1 millisecond */ 



396 



NINTENDO DRAFT THE AUDIO LIBRARY 



Implementing Vibrato and Tremolo 



Note: A full example of vibrato and tremolo implementation is given in the 
latest version of the playseq demo. Gen.MidiBanlc.inst has examples of how 
vibrate and tremolo would be set in the bank. 

Vibrato and tremolo, are maple'ri:^nigi|;By providing three callback routines; 
initOsc, updateOsc, and stopOsc. Tne|e routinlf act as the low frequency 
oscillator (LFO) that is modulated agai^eitfier pitch or volume. When the 
sequence player determines that a note uses 'either vibrato or tremolo, it will 
call mitOsc which will set a current value, and return a delta time specifying 
how long before it neeipto il'pfihte the value of the oscillator. After the delta 
time has passed, updateOsc wiH Detailed, which will set a current value and 
return a delta time until the next update. This will continue, until the note 
stops sounding/;. and at that time, stopOsc will be called, so that your 
apphcatiojfcari ifeany necessary cleanup. 

What each:;routiiii : does/ and how it does it is largely up to the application. 
All the sequence player ejects is a delta time until the next callback, and a 
value to use as the cur renf value. In addition the sequence player provides a 
s|echanism for each B#tfe to have its own data, and for this data to be passed 
|pf§ubsequent calls of updateOsc. 

For vibrato or tremolo to be active, you must set the vibType or tremType of 
the insfroirnent in the .inst file. A value of zero (the default) in these fields 
will be interpreted by the sequence player as either vibrato off or tremolo off. 
Any non-%fro value will be considered as on. In addition to the type, the 
following fields can be used to specify parameters for the oscillator: vibRate, 
vibDepth, vibDelay, tremRate, tremDepth, tremDelay. These values are eight 

|bjt values and can be used in whatever way the oscillator callbacks deem 

'Impropriate. 

lichen creating a sequence player, you must pass pointers to your callbacks 
'through the ALSeqpConfig struct. The following code fragment 
demonstrates how to do this. 

ALSeqpConfig seqc; 

seqc.maxVoices = MAX_VOICES; 

seqc.maxE vents = EVT_COUNT; 

seqc .maxChannels = 16; 
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s eqc . heap = &hp ; 

seqc.initOsc = SinitOsc; 

s eqc. updateOsc = &updateOsc; 

seqc.stopOsc = &stopOsc; 

alSeqpNew(seqp, &seqc) ; 



The initOsc routine ~** mf '") ;| : _ 

ALMicroTime initOsc (void K *osc^€i':| : #ii- :;;;;: f 3 2 *mitVal, u8 
oscType,u8 oscI^|,, u8 oscDepth, u8 oscDelay) ; 

The initOsc routine is thewst callback to occur when a note is started, and 
either the vibType or tremType is non^l|p. Vibrato and tremolo are handled 
separately by the sequence player, so if "an instrument has both vibrato and 
tremolo, two cai|s^)&gll be made, one for each oscillator When called, initOsc 
is passed a handle, irfwhich it may store a pomter to a data structure. This 
pointer will;§| passe||!feac|;io subsequent calls of updateOsc and stopOsc. 
This is optional. The second "argument is a pointer to an f32 that must be set 
with a valid oscillator value. The remaining arguments are the oscType, 
oscRate, oscDepth, and pselpelay. These values may be used as you wish. 

Typically initOsc will allocate enough memory for its data structure, and 
iflore aiipointer to this memory m the oscState handle. This is optional 
i'though/and if your oscillator doesn't have any state information it may not 
need to do" this. After performing any computation that it needs, the initOsc 
routine renir%5. a delta time, in microseconds, until the first call to 
updateOsc. Iflrdelta time of zero is returned, the sequence player interprets 
this as a failure, and will not making any calls to either updateOsc or 
lllppOsc. If the initVal is changed, the new value will be used. If the initVal 
"retrains unchanged, vibrato will default to a value of 1.0 and tremolo will 
derfult to a value of 127. 

: If the oscillator is a vibrato oscillator, the return value is multiplied against 
s fne unmodulated pitch to determine the modulated pitch. A value of 1.0 will 
have no effect, a value of 2.0 will raise the pitch one octave, and a value of .5 
will lower the pitch one octave. If the oscillator is a tremolo oscillator, the 
returned f32 should be an integer value between and 127. This value will 
be multiplied against the unmodulated volume to determine a modulated 
volume. A value of 127 will be full volume, and a value of will be silent. 
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The updateOsc routine 

ALMicroTime updateOsc (void *oscSta*t*¥$11#B' ! 2 *upj|tteVal) ; 

The updateOsc routine will be cal^ed^wJKenever the della time returned by 
either irutOsc or the previous uj^afetlf £ call has expired. When called, 
updateOsc is passed the value Jkmmed||y:fr|i^sc in the oscState handle. 
UpdateOsc should make whatever c.a^ulatiohs:l| needs, set the new 
oscillator value in updateVal, ancK-reliHfri a deM|krie until the next time 
updateOsc needs to be called. Valid oscillator values are the same as in the 
case of initOsc. .#;..,. ^ 

The stopOsc routine f 

void stopOsc (void *oscS€BtUk--, 

The main pu||>Qse ; of the stopOsc routine is to give the application the 

opportunity 1 to free : : any memory stored in the oscState. StopOsc is not called 
until thefjfxjrte has/g0ffi|3ii|e.ly finished processing. Even if your routine does 
nothing, ylk shdffld still'lSlve a stopOsc routine if you have an mitOsc 

routine. :--M 
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Chapter 20 

Audio Tools 



This chapter describes the various- l||||io tools for the Nintendo 64. These 
include: an instrument compiler, which can be used to prepare banks of 
sounds and cillfel information used by the sequence player and the sound 
player; a g|t "of to : 6|s to compress and decompress sound data for the 
NmtendJjM ADP^#lojrmat; and tools for converting and printing MIDI 
files. 'li '•-'' 111 
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The Instrument Compiler: ic 



The Nintendo 64 Audio Library synthesizes audio from MIDI events using 
information contained in the .ctl and .tbLd^ta^ files. These files, along with the 
.sym file, are known collectively as $^kfile|, and are created by the "ic" 

toot. 11 JP^-liik 

The .tbl file contains the ADPCM cornpresfld audicj|?ave table data. 

The .ctl tile contains information about how thewavetables are to be 
synthesized. It includes lnformltipn about the wavetable's envelope, pan 
position, pitch, mapping to MIDI note numbers, and velocity values. For 
more information about the format of "tfec.tl file, see the section "Bank Files" 
in Chapter 15 '"'^ 

The .sym file co#a ; in§-:t||e bank file's symbol information, and is used mainly 
for developing! and dfej^flpg. It is used only by the audio bank tools, not 
b y the Audio lik^aryvg; 1 ' 5 " 

Note: ic can also be used to collect sound effects into a single bank structure 
for inclusion in the ROM= In mis case some of the features of the Bank format 
arejIjilHsed (f° r example, Keymaps and Instrument parameters). 



Invoking ic 

Invoke ic by enltfing this command; 
"'fSglsrv] -o <output file prefix> <source file> 
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Table 20-1 ic Command Line Options 
Command Line Option ,.$ 



fEu.nction 



-o <output file prefix> 



<source file> 



TiiinsWWerbose mode, which causes 
^#compi||£. to produce a quantity of 
-■largely usejtss information. 

§£eeihes|he prefix for the .ctl, .tbl, and 
.sym'Mes created by the compiler. 

The name of the file containing the 
source code for the banks of instruments. 



Writing ic Source Files 

Instrameif|-£omJP"er s6tlj|e files consist of C-like definitions tor the 
collection of objects that ijjfke up the Bank. There are objects to represent 
banks, mstruments^ souj^fjl, keymaps, and envelopes. Each of these objects 
jlrcietailed below. ■^-MS™" 

Thelfrank Object 

A banlljibject, denoted by the keyword "bank," contains an array of 
instruments, a sample rate specification, and an optional default percussion 
instrumeritPln the example below, the bank defined as "GenMidiBank" 
contains one instrument, called "GrandPiano," at instrument location 0. It is 
intended to operate at 44. 1 kHz. 

lf|nk GenMidiBank 

ffampleRate =44100; 
'program [0] = GrandPiano; 
} 

Note: The General MIDI 1.0 Specification specifies that MIDI channel 10 is 

the default drum or percussion channel. As a result, many General MIDI 
sequences do not contain program change messages for channel 10. You can 
specify the default instrument (program) for channel 10 as follows: 
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bank GenMidiBank "Wk.. "Hi 

s amp leRate =44100; ,_ , r , : 0M 

percussionDefault = Standard_Kit ; -tfefSv 

program [0] = GrandPiano; .^^'iK 

The Sequence Player sets the defauii|r^fe}tnent fofffjiannel 10 messages to 
be "Standard_Kit." '~ >:9y "W\ 

The Instrument Object 

The instrument object, referlhcecrl|||^je bank object, contains the overall 
volume and pan for the instrument as v^ejias the list of sounds that make up 
the instrument. "**' 

In the exampleielo'w/lhe "GrandPiano" instrument contains eight sounds: 
"GrandPianc||F', "Gra|^|iioCT', "GrandPiano02", "GrandPiano02", 
"GrandPianoffil, "GfaHdPiari|i4", "GrandPiano05", "GrandPiano06", and 
"GrandPianouiP. 

The overall instrument vdfiime is 127, or full volume, and is panned to the 
posfMort64, which is center. 



instruments 

{ 

volume 


||GrandPiano 


=; ■ ■ 


£27 




pan 


= 


64; 




sffijv, sound 


[0] 


= 


GrandPianoOO; 




[i] 


= 


GrandPiano 01; 


gg^uriU 


[2] 


= 


GrandPiano02 ; 


JfSund 


[3] 


= 


GrandPiano03 ; 


sound 


[4] 


= 


GrandPiano04; 


'**'*" sound 


[5] 


= 


GrandPiano 05 ; 


sound 


[6] 


= 


GrandPiano06; 


sound 


[7] 


= 


GrandPiano07; 
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The Sound Object 



The sound object specifies the volume and pan, Wyboard,;i|t'apping / and 
envelope for the sound. It also specifies the AIFF-C scnmcl-flle containing the 
ADPCM compressed wavetable,datg|: A. description of the AIFF-C format 
expected by ic (which is generated by the ADPCM encoding tools) is given 
in the section titled "ADPCM AIFC Fo|mat" in Chapter 21. 

Note: The Sequence Player multipliei s the mshSment volume with the 
sound volume to get the overall volum^f lysine instrument pan with the 
sequence pan to get thej; ; sound's overallpMh. 

In the example below, the Grand PianoOO sound specifies that the wavetable 
data is to come from the file ../soiSBk/QMPianojZ2.18k.aifc. It is to be panned 
center (64) at full volume (127) and arranged on the keyboard according to 
the map spqeifj^d in piano 00 key with the envelope specified in 

G r an d P i ait $En v/ W "; 6 



sound GrandPianoOO 

%% use ( " . . / s o'fiillf GMPiano_C2.18k.aifc") ; 
§0xi+ Pan = 64 ; 

«|; ;: volume ~ 127; 
'=|%eymap = pianoOOkey; 

eftyelope = GrandPianoEnv; 
} % 

Keymaps Wid envelopes are described in the following sections. 

Note: When using banks to collect sound effects, the keymap entry is not 
Necessary. 

0he Keymap Object 

The keymap object, referenced by the sound object, specifies the range of 
MIDI velocities and key numbers that the sound is intended to cover. It is 
used by the Sequence Player to determine which sound to map to a given 
MIDI note number, and at what pitch ratio to play the sound. 
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In the example below, pianoOOkey specifies a M||| ; Note::pri message with 
a velocity between and 127 and a note number rM^Si^O and#3 

In this example, the key Base is 41, so a J^EpI Note on message for key 41 

triggers the sound that references this -key-map at unity pitch. A MIDI Note 
On message for key 42 triggers the |ame soi^ : d^but,shifted up a half step m 
pitch. '" !: llSl. 

Note: You can set the key Base value outsid^the rarjge of keyMin to key Max. 
This is useful if you want to critically resample a wavetable to conserve 
ROM space. You could, forSpance, resample a wavetable from 44.1 kHz to 
22.05 kHz and adjust the kevp'a^e:-up. an octave to compensate. Remember, 
however, that quality degrades at latj§g& pitch shift ratios. 

The detune parameter indicates the number of cents that is to be added to 
the default tuning? I^lialf step is equal to 100 cents. 



keymap piandff key : "" ; &j! 

i M 

v e 1 o c i tyM i n = .£ ; > ; . 

...ItlocityMax = iZTf"' 

J|SIj|flin = 0; 

,S:f'ke$kax = 43; 

si||P keylise = 41; 

detuife, = ; 

} "'||^ 

The Envelope Object 

"Tliljenvelope object specifies the attack-decay-sustain-release (ADSR) 
envelope, or volume contour, for a sound. Volumes are specified in the range 
of to 127, and the times are specified in microseconds. 

ft file example below, the sound's envelopes would ramp from to 127 in 
microseconds, decay to in 400 milliseconds, wait for a MIDI Note Off, and 
then release to in 200 milliseconds. The decay portion of the envelope 
decays to zero. For many acoustic instruments, especially percussion 
instruments, this gives the most realistic envelope. 
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Note: The Sound Player uses envelopes in Wj ; |Ughtly ;; #&eMnt way- See 
Chapter 19 for details . ''"%pp^ |§ 



A Complete Example ^sm^~. 

The following example, taken fr^m the^^H^MIDI bank that is shipped 
with the development software, defines. a bank with one instrument, the 

G ran d P ian o . 'P " ■■■; § 



enve lope Gr andPi ano S:li^:::, :; , 

{ 

attackTime= 0; ^ 
attackVolume= 127; 
decayTime= 4000000; 
d e c a y Vc-^gme = ; 
rele^fe' 1 ? imi| 200000, 
r e 1 eg!s eVo li^|e:%v '$) ; ^. , 

keymap pianoOOkey 

:|::I>i§; velocityMin = 0; 

'■?: veloci tyMax = 127; 

1||j5yMin = ; 

k§y Max = 41; 

klfBase = 51; 

deti&e = ; 



jund GrandPianoOO 

I use ( n . . /sounds /GMPiano_C2. 18k. aifc") 
| pan = 64 ; 

volume = 127; 

keymap = pianoOOkey; 

envelope = GrandPianoEnv; 



} 



keymap piano Olkey 
{ 

velocityMin = ; 

velocityMax = 127; 
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keyMin = 42 

keyMax = 49 

keyBase = 63 

detune = 0; 
> 



sound GrandPianoOl 
{ 

use (".. /sounds /GMPiano. 

pan = 64; 

vol ume = 127; .|| i; „ 

keymap = p i ano 1 keygv|;;., u 

envelope = GrandPi&noEi^ 
} W 

keymap piano02key 

ve 1 o c i tyjli n " ' = ::: -||. .; 
v e 1 o c i Ipjjflax = . fife? p. k 
k e yM i n f |1 . =$S -0 ; 'kj|, 
keyMax **' = 57; 
keyBase = 67; 

detune = 4illSl a;r 



i>; sound G£andPiano02 

use tfj^. /sounds /GMPiano_F3 . 19k. aif c") 
pan '%%■■„ .6 4; 
vo lume =t|pi 2 7 ; 
keymap = piano 02 key? 
envelope = GrandPianoEnv; 



IHjb piano 03 key 

:s velocityMin = 0; 
velocityMax =127 

keyMin =58 

keyMax = 63 

keyBase =72 

detune = 0; 



sound GrandPiano03 
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{ 

use ( w .. /sounds /GMPiano_C4. 22k. 

pan ~ 64; 

volume = 127; 

keymap = piano 03 key; ,.^8^.,. 

envelope = GrandPianoE|ipr' !;( "'v>;^ ; 

> if lis 

k eymap pianoQ4key 

velocityMin = 0..; 

velocityMax = l-fef,. 

keyMin = 64|" 

keyMax = 6|lp 

keyBase = "i 

decune = 0; 

s ound GrJgpfdP i arMO 4,:, : :? , .. 

{ it M^ mm % A 

use flfy. /sounds /Gl|j|iano_G4 . 2 2k. aif c") 
pan = 64; 

volume = 12 7.;, .,.,;,#■"' 
•;>;.:>•, keymap = pianB'S'fkey; 
%±~ % t envelope - GrandPianoEnv; 



nap:.;; p i ano 5 key 

veloci£;tyMin = 0; 

velocftyMax = 127; 

keyMin =70. 

keyMax =75, 

keyBase = 84, 

detune = ; 



"sound GrandPianoOS 
{ 

use {". . /sounds/GMPiano_C5.22k.aifc" 

pan = 64; 

volume = 127; 
keymap = p i ano 05 key; 
envelope = GrandPianoEnv; 
} 
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keymap piano 06 key 

{ 

velocityMin = 0; 

velocityMax = 127; 

keyMin = 76; 

keyMax = 81. 

keyBase = 91; 

detune = 0; 
} 



sound GrandPiano06 

{ 

use { " . . /sound 




s/GM-0Ianoj;( 


pan = 64; 




volume = 127; 




keymap - Pie§g|£ 


06key; 


enve 1 ope$#'' ""G^a 

} ff . 


Jl&PianoEnv 


keymap pianottfkey 

{ 

VjSlocityMin = 




Q:m : 0-' r 


; ;::VM.locityMax = 


£%ff- 


jJPp'kiy^Min = 


82; 


_:M: V keyJMax = 


111; 


p^' keylljse - 


99; 


detune:-. = 


0; 



. aif c' 



GrandPiano07 

;e (" . . / sounds /GMPiano_C6. 18k. aif c 

m = 64; 

Dlume = 127; 

aymap = piano 07 key; 

ivelope = GrandPianoEnv; 



;rument GrandPiano 

volume - 127; 
pan = 64? 

sound [0] = GrandPianoOO; 
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sound [1] 

sound [2] 

sound [3] 

sound [4] 

sound [5] 

sound [6] 

sound [7] 



- GrandPianoOl 
= GrandPiano02 
= GrandPiano03 

- GrandPiano04 

= GrandPianfiS:|; 
= GrandPianJs06 
= GrandPialil07 



bank GenMidiBank 

{ i 

sampleRate = 44100; 
program [0] = Granc 
} 
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The ADPCM Tools: tabledesign, vadpcm_enc, vadpcm_dec 



The ic tool requires wavetables to be compressed in ADPC^;|Ormat before 
they are included in a sound bank. AIJ|||£ compress ion is accomplished 
using the tabledesign, vadpcm_enc,; ; ::a^d ?a||)cm_dec tools. These tools are 
described below. xSft;^-^ 

Note: The format described is used only '^j|an mterjfange format between 
the compression tools and the instrument ^tn^jle^t is not used to store 
compressed sound data on the ROM. 



tabledesign 

tabledesign reads*a%AIFC or AIFF sound file and produces a codebook 

(written to stan^arf Ifttput), which is used by the ADPCM encoder. The 
codebook is arable of ||e:d;iftjon coefficients which the coder selects from to 
optimize soui§i H guaH^''Tne !! ^cedure used to design the codebooks is 
based on an adaptive clusterijfjf algorithm. 

Invollng tabledesign 

t^ledSiign [-s book_sizeJ [-f frame_size] 
.^|-i ref ing_icer] aifcfile 
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Command- line options are described in TabB§|4-2. 
Table 20-2tabledesign Command Line Options 



Command Line Option 



-s <value> 



-f <value> 



-i <value> 



s Eu notion 



Value isjhe base 2 log of the number of 
enja^^i^ffe table. Currently up to 8 
^tttfies are supported, so the value can 
"iafige from J; fo 3. The default value for 
tr|||:: parameter is 2, giving 4 entries. This 
seefns:>febe adequate for most sounds. 

Value is the size of the frames (in 
samples) used to estimate predictors. 
|Since the ADPCM encoder operates on 
flames of 16 samples, this number 
should be a multiple of 16. The default 
value is 16, The main benefit of 
increasing the frame size is that design 
time is reduced. 

Value is the number of iterations used in 
the refinement step of the clustering 
algorithm. The default value is 2. 
Increasing this parameter increases 
design time, with some possible 
improvement in quality. The default is 
adequate for most sounds. 



vadpcm_enc 

If kdpcm_enc encodes AIFC or AIFF sound files and produces a compressed 
b1§;ary file, which is used by ic to prepare banks of sounds. The encoding 
aJpDrithm is based on a switched ADPCM algorithm which uses a codebook 
to define a table of prediction coefficients. Coefficients from the table are 
selected adaptively during encoding to give the best sound quality. The 
Nintendo 64 compressed sound format currently supports a single loop 
point, which should be defined in the input file's Instrument Chunk. The 
codebook and loop-point definitions are embedded in the final output file. 
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Invoking vadpcm_enc 

The vadpcm_enc tool is invoked as follows: 

vadpcm_enc -c codebook [-t] [-.J^^^popLength] 
aifcFile codedFile " <v :i|. 



Table 20-3vadpcm_enc Coiru$f^nd Line Options" 



Command Line Option M^''~'*'^iku Function 



-c <filename> '^Tjlfine a file that contains the prediction 

coellicient codebook constnicted by 
..xx#&*... tabledesign(l). 

-t % ,,,,«,-, Truncate the encoded hie after the loop 

*&;§: en d p oint . The portionofthesoundafter 

■§■:■ the loop end-point is never used in audio 

playback. 

-1 <value> ''W' : 'i^: Jr ' Set the minimum loop length in the 

.<;§!%■ encoded file (see Note below). 



Note: Thl^fficiency of wavetable synthesis is dependent on the length of 
loops. Longer: loop lengths can be synthesized more efficiently- A minimum 
loop length can be set in the ADPCM encoder. The currently defined default 
minimum loorxlength is 800 samples. This default length can be changed 
(see above), with the absolute minimum being 16 samples. Loops shorter 
:#ian the minimum loop length are repeated until the total loop length is 
: iaf|e ( | than the minimum length. If possible loops should be longer than a 
singt|;;audio frame which is equal to the (SampleRate)/(FrameRate). 



1Hl8pcm_dec 

vadpcm_dec decodes a sound file that has been encoded in the Nintendo 64 
ADPCM format using vadpcm_enc, and writes it to standard output as raw 
mono 16-bit samples. 
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Invoking vadpcm_dec 

The vadpcm_dec tool is invoked as follows: s^^JM : ' 

vadpcm_dec [-1] codedfile ^ixfiM-'^ 

Table 20-4vadpcm_dec Command Line Cations 



Command Line Option Function 



-1 If the sound has a loop, play the loop 

'# '^6;; i: %><, repeatedly until a key is pressed on the 

"—; standard input. 
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The MIDI File Tools: midicvt, midiprint & midicomp 



midicvt 

The Audio Library plays only TypH Standard MIDI, files. You can use 
midicvt to convert from Typel (which are generally output by most MIDI 
sequencers) to TypeO. 

Invoking midicvt 

midicvt is invoked as follows: ° ; '%% : ,,. 

midicvt [-v] [-s] <input file> < output file> 



Table 20-5rr 



)ptions 



Command Line Option 



Function 



input file 



turns on verbose mode 

strips out any messages that are not used 
by the Audio Library. These include text 

messages and system exclusives. 

the name of a Type or Type 1 Standard 
MIDI file. 

the name for the Type output file. 



micii^rint 

"Tne midiprint tool prints a text listing of the time-based MIDI events in a 
Type or Type 1 Standard MIDI file. 

Invoking midiprint 

midiprint [-v] -o <output file> <input file> 
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Table 20-6midiprirvt Command Line Options 
Command Line Option ...^^u notion 



-v vesfbps-e^mode. 

- o <o u tp ut file > : . i; - J|e op tioitg| outputfilefortheMIDI 

event text;;:;;:;'?" 

<mput nle> jg..,. tfl||>a|r#of the Type or Type 1 

- ■• m„. Standard MIDI file to list. 



midicomp 

The midicfnap toSlTs use|ffto compress midi files of either Type or Type 1 
to a format '-recognized by? 'the compact sequence player. 

,||^pking midicomp '""""''" 
midlbomp is invoked as follows: 
midicomp <input file> <outpuc file> 



Table 20-7midicomp Command Line Options 

Ip^mmand Line Option Function 

Jfhput file> the name of the Type or Type 1 

W Standard MIDI file to compress. 

<output file> the name to use for the output file. 



Making files that will compact better. 

Different midi files will be compressed by different percentages, based on 
the content of the files. All files (except very small files) should be 
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compressed at least somewhat. Because midicom^|cHeyei:;comf ression by 
recognizing patterns and then compressing these, mejpiatest amounts of 
compression occur when the files are repetitive. Patterns and^Jtions 
created in a sequencer using cut and paste :S are the ones most likely to be 
compressed. 
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Midi Receiving with Midi Daemon: midiDmon 



Midi Daemon is no longer supported. All functionary |: fiG|p Mi A Daemon is 
now incorporated into Instrument .EdU or ' 
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instrument Editor 



The tool Instrument Editor provides three primary uses. Firsfe||lp : editor, it 
allows realtime editing and auditioning^ygstrument banks and effects. 
Second, as a player, it allows external MIDI devices to playback MIDI on the 
Nintendo 64 Development Hardware. Third,, as a profiler, it profiles and 
measures audio resources that are being used during playback. With its 
support for MIDI playback, the le tool is intended to|p>lace the 
functionality of the Midi Daemon tool. 

Instrument Editor is mvokedj§|th ; .the command: 

ie [-b <.inst file>j [-'cf <.cnfg:iji.le>] [-vj 

Table 20-8ie Command Line Options 



Command Line Option Function 



-b <.inst file> - ; W' V '' ''Wis. specifies the name of the instrument 

bank file to open m the editor. If this 
option is not used, the editor opens with 
.,*,,, a new .ins t file. 

-c <jlf^le> specifies the name of the configuration 

file used to configure the N64 Audio 
%?$. Library used by ie. 

_ v turns on verbose mode, (for debugging.) 



Editor 



The Slitor portion of the ie tool is a simple application for editing .inst files 
as welts effects. A Nintendo 64 development board does not have to be 
£m0t to open and edit .inst files. However, you will not be able to audition 

!c your changes without the Nintendo 64. 

Bank Editing 

The ie tool can read, write, and edit .inst files, .inst files contain a description 
of a Nintendo 64 bank which can be compiled into actual Nintendo 64 bank 
files with ic, the instrument compiler tool. The .inst bank description is 
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made up of several components such as instalments, sounds; envelopes, etc. 
Each of these bank components, or assets, have one or more ^parameters 
associated with it. For example, an instrument asset as vpltime, pan, and 
bend range parameters associated with it among othlliMssets can also 
reference each other in a sort of nj|ferjfecjhild relationship. For instance, bank 
assets reference instruments asjpfs so instruments are children of a bank. 
Similarly, instrument assets reference g|ph€Si&sets so sounds are children 
of an instrument. Furmermore7i|;|;clgp asset ililever referenced by another 
asset (ie. it has no parent), it is called i|h orphahF So if an envelope asset is 
never used by a sound, asset, the enveiw|fe#;::i#i orphan and can be deleted 
from the -inst file withl|||v|.ffecting the banK 

Viewing Assets ^fe. 

The editor displays all these bank assets and supports viewing and editing 
the parent-e^tilf;;i^Iationships within a bank. The editor's view contains 
several loiters fdlfachfype of bank asset. Each folder contains a list of all 
the asset|pf the |pgh : type. For example, to view a bank's instruments, 
simply sel'ift memstrumejlt's folder tab to open up the instrument folder. 
The folder contains a list of all the names of me instruments as well as 
cplumns for each of a|f instrument's parameters, such as volume, pan, 

v j|)r|yriry, and bend range. Each asset also contains an icon column which 

pnelj|s identify the type of asset. 

Editing Assets 

To edit th^falue of an asset's parameters, simply click on the corresponding 
column to activate the default editing for the parameter. Names are always 
text edited. Numbers can be scrolled up or down to increase or decrease 

ijheir value. References to other child assets are edited with popup menus. 

"However, all assets can be text edited by clicking on them with the "Alt" key 
Jj|ld down. This pops up a text edit field which can be moved around from 

Jpld to field using the arrow keys and the "Alt" key. (Without the Alt key, 

f the arrow keys move the cursor within the text field.) Values won't be 
accepted if the value is out of range or is illegal. Use the "ESC" key to cancel 
any text editing. Note that some fields cannot be edited (eg. a wavetable's 
sample rate) and only display information. Icon fields are used for a variety 
of purposes such as asset selection, asset audition, and others. Integer fields 
can be double-clicked to quickly set the value to a preset default value. 
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Viewing and Editing Children 

Some of the assets contain a "#" column. This column 'displays tfte number 
of children that the asset has. If the asset has one or more dlMiren, 
double-clicking on the "#" column wj^ggig up the parent and display its 
children. Since the children have different parameters than the parent, only 
the common fields such as the namf|ield arjpiip|f|ed for children. 
Double-cliclang the "#" column again will close thelfiset. The "#" field can 
be edited by clicking on the field. This wiHljrng upjjbopup menu showing 
a list of assets that are currently not duldrel|j||s|hg;&iiected asset. Choosing 
one of these assets will add i|to, the parent's lit of children. Double- clicking 
on the icon of a child, will automatically open up the children'' s folder for 
editing of their parameters, |ior el|ft^c>le, double-clicking an instrument's 
sound will open up the sound folder Wfediting. Likewise, double -clicking 
a sound's envelope will open up the envelope folder for editing. 

Auditioning Assets 

In order to aullffon assets, thlfjurrent bank being edited must be "valid" 
and must be "online" on the Nintendo 64. For a description of what it means 
for a bank to be valid and; online, see the Nmtendo 64 Playback section. 
Whfffelibank is online, bafSc assets can be auditioned by clicking on their 
kqf ? : ' Posing the button down sends a MIDI note on event. Releasing the 
Milton sends a MIDI note off event. This makes it easy to audition the 

ifetain plfcion of a sound. Currently auditioning instrument assets will 
always piay|| ; C4 note. Auditioning sounds, keymaps, envelopes, and 
wavetables wlj|;play the asset's parent instrument at the sound's key base. 
Note that if theleymaps for an instrument's sounds are not specified and 
ordered properly, an auditioned asset may not get mapped to the correct 

?|s©und. This is a potential source of confusion when auditioning assets so 
mafcsure that the auditioned sound's keymap is correct and complete 
beforl;auditionmg. 

The File Menu 

The file menu contains commands for opening, closing, and saving .inst 
files. The "Open" command brings up a dialog for selecting a .inst file to 
edit. Only one .inst file can be open at a time so choosing "Open" while 
another .inst file is currently open will first close the file before opening a 
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new one, The ''Close" command removes all bank assets "and allows a new 
file to be edited. The "Save" and "Save As" cdf^afid wntet;ihe file to disk. 

The Edit Menu 

The edit commands are currently not s^|p|p§pd. 

The Asset Menu Jjf 

The Asset menu contai||..coirimands foYmflrting and deleting assets. 
Selecting the insert comipi^d will create a new asset and place it at the end 
of the list. The asset will automatically have default parameter values. To 
insert an asset in the middle of ffi|li$t 7 select the asset where you want the 
asset to appear and select the insert' command. The selected asset will 
appear belowJhe newly created one. To delete assets, simply select one or 
more asse|5|;an#s£|ect the delete command. A short cut for creating an asset 
and addij||it to a|te|||4s provided by the "Insert Child" command. This 
commane§!kill rns|f t a rW%child asset to the selected parent. The "Remove 
Child" coirlmand removejfphe selected child (ren) from the parent, but does 
NOT delete them. ChoGjfp%\e "Delete" command to remove and delete 
them. Finally, the "mlport" command allows importing of other .inst files as 
well as .aiff-c files. This is currently the only way to create wave table assets. 

The Select Menu 

The selei|menu contains useful commands for selecting certain types of 
^ assets. Thjf*Select Parents" command will select all the parents of the 

. '■' currently selected asset. This command works only if exactly one asset is 

/:? , : .:$0Mk selected. For example, if a keymap is selected, the "Select Parents" 

%§§■ j|S' %i! * : ' :S? ;fgommand will select all the sound assets that use the given keymap and will 
Automatically display the sound folder. The "Select Orphans" commands 
Spil select all the folder's assets that do not have any parents. This is useful 
^^Iglflor determining which assets aren't being used anywhere and which can be 
j ^tel# 9 '" deleted. 

" ! ^!|§ fe Effects 

■#|fe &s The ie tool supports creating, editing, and auditioning effects on the 

" ' : fi^ Nintendo 64. Since effects are tightly coupled to the N64 Audio Library they 

• : • y ' i9 W§: ; ^ wlU only appear for editing if N64 development hardware is present. 
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Otherwise, only bank components can be editedMf N64 development 
hardware is present, ie will automatically create n%| :|?:u^pi eifejjf for 
auditioning and editing. These effects are small roorfC Dig room^chorus, 
flange, and echo. In addition to the built-in effects, custom-elfects can be 

createdfromscratch. . - »; & I 

Effects Viewing U 

Similar to banks, effects are made up of twoll^mponjlfis, the effect asset and 
the effect section asset. Sim^jfi effects may cdnt^Jbnly one or two sections, 
while more complicated eff ectimay contam eight or more sections. Similar 
to banks, effects are parents tcpffe'itjsection children. As a result, effects can 
be viewed just like bank assets canlfl||iewed. All effects parameter values 
are displayed in their native data fonrialf j&e format that the N64 requires 
them in) except for the delay fields (length* input, and output). The delay 
parameters are d|$fiji||?ed in milliseconds and must be converted to samples 
and aligned toiiriS sample boundary before bemg used to configure a game, 
(ie does this a^maticjS^ ; wS^ it loads an effect for auditioning.) 

Effects Editing 

Effegf|f%d effect sections can be edited just like bank assets. However, there 
arglb'me^lpecial considerations when editing effects. 

first, the de|ay parameters (length, input, output) are displayed and editing 
in msecs. Tr?f||£J64 requires that these values occur at 8 sample boundaries 
and that the ler§|j$i is greater than both the input and output delays by about 
160 samples (depending on the chorus rate). (See the section on audio effects 
for a more detailed explanation of the 160 sample restriction). The ie tool 

IlijfOmatically enforces the 8 sample boundary rule when it loads the effect 
on tni||v[64, however it does not enforce the 160 sample rule. Be careful 
whenfliliting input or output delays so that they do not approach within 160 
samrjps (depending on the chorus rate) of the delay line's length. Normally, 

ctflfis limit is exceeded, you will hear artifacts in the audio such as clicks and 
pops. 

Secondly, when an effect is "online" (ie. it is loaded into the N64), the effect's 
length parameter cannot be edited. In addition, you cannot insert or delete 
sections to an online effect. In order to make these changes to an online 
effect, you must offline the effect first. 
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Thirdly effect sections can only have one parf||t OnceTt isb>|pg used by a 
parent effect, it will not be available for other ifc^# : use ij| 

Finally, to use chorus or the low pass: filter, you must make sure that the 
respective parameters are non-zero before loading the effect. The Audio 
Library will not allocate the reqjiped me;|^r^|o implement chorus or the 
low pass filter if the parameters!! r.e ini0iy''2 : el||this saves unneeded 
memory). '^?m^ 

Effects Auditioning ■*«&?&* 

Initially, no effects are l<Mel%to the N64. In order to load an effect and 
make it "online", doubleclick fftlfeired effect's icon. To offline the effect, 
double-click it again or double-click another effect. When an effect is placed 
online, the Ng^must be fully reconfigured since the Audio Library must be 
initialized .affi&eifect. This may take a few seconds since it must reload 
the entirejKnk to tB<2#1&4- Once the effect is online, its icon should appear 
in red to i|||cate#a : tlt : i ; ::|nrine. From now on, auditioning bank assets will 
be played Bough the eff||l- Note that the wet/ dry amount can be 
controlled for each MID|jitannel by sending an FX1 control message to the 
Channel. ' ;; ^lfc" c " 

Effects Saving and Restoring 

Currently, effect assets can not be saved to disk. This is because there is no 
standarSfc|x" file like there is an ".inst" file for bank assets. However, effects 
can be resiled from disk with a configuration (.cnfg) file. (See the section on 
the N64 Configuration for a description of the configuration file.) Since the 
: , Audio Library treats effects as part the the configuration data you can edit 
tithe configuration file to include a custom effect. An effect is defined with the 
"'fly word "REVERB^PARAMS" and is followed by a bracketed {...} set of 
jjjrameters describing the effect and its sections. Below is an example of an 
Jllrfect with 8 sections and a total delay line length of 325 msecs. Note that 
"' comments are bracketed by /* ... */• 



REVERB_PARAMS = { 

/* sections length*/ 

8, 325, 

I* chorus chorus fltr*/ 

/* input output fbcoef ffcoef gain rate depth coef*/ 
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N i nt e n d o 6 4 P I aye r a n d P r of i I e r 

When ie is launched, it autoifjaticM&ipoks for an N64 development board 
and if it finds one, it will boot it up wlmMJDI playback code and profiling 
code. If it can't find the N64 board or if if ; f ails to boot it up, it will report an 
error and ie will not be able to audition any instruments or edit effects. In 
addition, ie wilhfls-o'h'dQt up the gload tool which acts as a print server for 
any error or de|uggmg:^essages. This is useful for detecting when an audio 
library resourekhas been exceeded. If another gload is running at the tune 
that ie is launcHia, ie will fail to run. 

,Ntnt||§|o 64 Configuration 

Th^Nintendo 64 Audio Library is configured using default configuration 
:iMformatioh, This default configuration can be edited either by using the 
configuratio|y ; ;dialog or by specifying a configuration file on the command 
line when the :; |bpl is run. For information on how to use the configuration 
dialog see the slfiion on the Nintendo 64 Menu. To configure the tool using 
a configuration file, simply specify the file on the command line. The 
inauguration file should contain reserved words that specify the values of 
! ce"rta%i. configuration parameters, such as output rate or the number of 
available virtual voices. For an example of a xnfg file and its reserved words, 
refer |§ the file /$ROOT/usr/src/PR/assets/banks/ie.cnfg. 

ffintendo 64 MIDI Playback 

Once it is up and running, the Nintendo 64 waits for incoming MIDI 
messages. MIDI messages can be sent from an external MIDI device or from 
the ie tool itself. In order for the Nintendo 64 code to respond to the MIDI 
messages, it needs to have a valid bank downloaded to it by ie. When ie is 

launched with a new file, there is no bank in the editor and the Nintendo 64 
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will be "offline" which means it does not have a bank installed. The 
profiling screen on the Nintendo 64 momtori^disaterthe state of the bank 
at the top of the screen. As soon as ie has a valid bank in the editor, it will 
download the bank data and the Nintendo 64 will mertJe^online" and it 
will be able to respond to MIDI events. As the bank is edited, it continually 
checks to see whether the bankp : still ^alicT and as soon as the bank fails 
to be valid, it will take the bank offline. The reason for this is simply that the 
Audio Library requires comple%an4}|6rrect bafjk data in order to work 
properly. A bank is determined to De|$alid if t|jf ? following conditions are 
met: ^ 'W&«y-:^&' : ~' 

1) a bank asset exists 

2) the bank contains atfiast 8m^igstrument 

3) the bank's instruments contain %ljleast one sound 

4) the bank's sounds must all have keymaps, envelopes, and wave tables 

When a bai^kis'&hline, bank assets can be auditioned from the editor by 
clicking <0| their icon. MIDI messages can also be sent from external devices. 
To use external devices /a%)TDI interface must be properly attached to one 
of the host computer's serif 1 ports and it must be properly configured using 
the startmidi tool. :6 , i: , V5 . 

Nintendo 64 Profiling 

The Nintendo 64 screen displays current readings for various audio 
resource^. These readings are useful to monitor when playing back a 
sequenctliargeted for the Nintendo 64 from an external MIDI sequencer. 
The readings will measure how much of each resource is used in order to 
playback the sequence. The profiler keeps track of the following resources: 
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Table 20-9ie Profiled Resources 



Profiled 
Resource 



Description 



cmds 



syn upds 



seqevts 



DMAs 



DMAbufs 



Vcs 



RSP 



the number of audio'f|mman|p^ a frame of 

samples. Profiles both c%^|- and maximum values. 

the number of parameter upalfehlpcis !■ used by the synthesis 
driver to stoj%: changes in control parameters. The number of 
available upd&ti^blpcks is specified during the Audio Library 
configuration^ Profile's; both current and maximum values. 

the number of event mesSf^feblocks used by the sequence 
player. The number of available message biocks is specified 
during, the Audio Library configuration. Profiles both current 

i;: and maximum values. 

4 the number of - ; ;D^MA requests made during an audio frame. 

; ; Displays both cfflrent and maximum values. The maximum 
number of DMffcfequests is specified during the audio system 
configuration Pro files both current and maximum values. 

the number of DMA buffers needed during an audio frame. 
The number of availabe DMA buffers is specified during the 
audio system configuration. Profiles both current and 
maximum values. 

| t this graph profiles virtual voice usage during playback. Each 
Ppixel represents one used virtual voice. The number of 
available virtual voices is specified during the Audio Library 
configuration. The maximum number of virtual voices used is 
displayed in the corner of the voice graph. 

this graph profiles the percentage of a frame period being used 
to execute the audio synthesis microcode on the RSP 
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Table 20-9ie Profiled Resources 



Profiled Description 

Resource 



CPU this graph profil§gffie|;ercentage of a frame period being used 

by the CPU du||tg the call :; to,aIAudioFrame. 

output meters this profiles the" ; ipJa ; k,oy;h5ut levei|;Qf the final output samples 
that are sent to the i: ahc||^ DACs. Jfie scale is in dBs with the 
top of the meter at dB'ahd then|lecreasing in 3 dB increments 
per LED. Signal levels aiMmWiiB are indicated by a yellow 
cautiohiSlD.. Signal presence is indicated by the bottom LED 
(ie anyjiioh^Fo. sample will turn on the bottom LED). Signal 
clipping; is indi^pdby a red LED that appears above the 
meter. Note that thl||||§ detector does not detect true clipping, 
rather it detects wheh'l sample magnitude value of 0x7fff 
;^ appears. This could be a legitamite value from a normalized 
"SSund or it could be a limited value caused by overflow. 

Be aware •tjjat the ;; resourcf |iemands for audio synthesis varies on a frame by 
frame basis. This is beeajfpe it must share the processing resources with the 
gther parts of the,;^||epf '" This means that the profile values will vary each 
time a given sequence is played. Therefore, the readings should be used as 
^ll^ai^SpproximatiQn, not as an accurate measurement of resource usage. Also 
; . ; :|;;f not£:that the CPU measurements can be affected by any debugging 
"if messages produced by the audio library Also the N64 code was not 
optimized by gcord and so is not displaying best case performance. 

Jig?' The Nintendo 64 Menu 

If the N64 development board is available, an N64 menu will appear in the 
#" ^editor. This menu provides control over some of the N64 functionality The 

''Clear Profile Values" item resets the MIDI player and causes all the 
maximum values to be reset to zero. The "Configure Hardware" menu 
brings up a dialog which can be used to set some of the Audio Library 
llli:^ configuration parameters. See Table 20-10 on page 428 for a description of 

(I--:. the various configuration parameters. After setting the configuration 

' S;,; »f r<» _ parameters, press the okay or apply button to make the changes take affect. 

Reconfiguration may take a few seconds since any open bank file must be 
5 l|k v fully reloaded to the N64. Configurations can be saved and reloaded at a 

M®!^,.. later time using the "Save Configuration..." and "Load Configuration..." 

■•■jP ^|&t %5 commands. These commands ask you to name the configuration file you 
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want to save or load before proceeding. Finally, tf|e ;: "ResebpaMware" 
command resets the entire N64 hardware forcing 1K§: ; |^$| J <f ode ijbe 
reloaded and the audio reconfigured. Use this command tatr^Jp recover 
the N64 if it crashes for any reason. ^iw- 1 

Here is a description of each of the configuration parameters: 

Tab te 20-1 Oie Configura don Parameters^! ; y::i: ; 



Configuration 
Parameter 



Description 



output rate the requestedfarfi||;bng rate of the audio interface m Hz. 

samples per the requested number'Bi^mples to be synthesized per audio 

frame frame. For maximum efficiency use a value that is a multiple 

of l,6p samples (eg. 640). A larger number means a slower 
;; v:frlme:rate while a smaller number means a faster frame rate. 
: j: This nurhbs.r^aiong with the output rate can be used to 
|j| simulate -a gaMe^nrnnmg at 60 Hz or 30 Hz. For example, at an 
U;©utpuf rate of lj|p0 Hz, setting this value to be 735 will 
produce an frarhe rate of 60 Hz, 

max cgj|tmands the maximum number of ABI commands that can be executed 
per &aMe per audio frame. This directly corresponds to the size of the 

audio command list buffer that stores the ABI commands. 

ifMA buffers . the number of available buffers for performing DMA requests. 

DMA buffer s;.ze the size of each DMA buffer. Smaller buffer sizes normally 

: '-'f require more DMA requests while larger buffer sizes normally 
'require fewer DMA requests. 

i$fs& DMA the maximum number of DMA requests that can be made. This 

' : reqiiasj:s value directly affects the size of the DMA message queue set 

' : ':|::: up by the N64 code. 

# frames to hold the number of frames that must elapse before the N64 code 
DMA buffers will free a DMA buffer for reuse. While the buffer is being 
"held", its samples remain available for other requests that 
may ask for the same samples. In some cases, the same 
samples may be used over and over again so holding them in 
memory is faster than performing a DMA from ROM. 

max virtual the maximum number of virtual voices available to both the 

voices synthesis driver and the MIDI player. 
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Table 20-1 Oie Configuration Parameters 



Configuration 
Parameter 



max physical 

voices 

max control 
updates 



max channels 



max events.. 



Description 



the maximum numpl' of physical voices available. If this is 
less than virtu alvrjices men yo^ce stealing & enabled. 

the maximum nulhber ;: ;0||Dntro] u||ates each physical voice 
is able to store. CohtroTfb dates stjfe data such as volume 
changes, pitch changes, f re, THs^yalue directly affects the 
memor|||llocated for contr^l^pi. 

the maxi|itirr| number of channels available for MIDI 
messaglsf No?mJ|MIDI systems support 16 channels. This 
affects how muchmtlhpry is allocated to store channel 
information. *W 

lt|e maximum number of event updates that the synthesizer is 

able to., store. Event updates store sequence data such as start 
cpmmf nds^MIDI commands, etc. This value directly affects 
III memo1|f|alIocated for event data. 



gggte that since audio sample DMA is implemented by the game application, 
j the DMA configuration parameters may not be applicable to your game. 
fee|). this in mind when setting these parameters. 



Bugs''l|,. : 

For a list of known bugs and problems, consult the man page for the ie tool. 
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Midi and the Indy 



Before using Midi Daemon, you will have to correctly configuE^your Indy 
for midi. Because there have been changes in both the midi software, and the 
serial ports, on the motherboard, it is recommended that only a recently 
purchased Indy and the latest softwjsif e release soused. 

Motherboards need to be of version O'l^^^wer. Tojlfeterrnine the version 
of your motherboard, open your Indy, and;|n the fri>nt right of the 
motherboard, you will find ^version numbe£;ft|:;:iff st four digits should be 
8123 and they are followed b|§&y;ee more digits that are the version number. 
The revision number that folf^¥$;;tt>e version number is not important. If 
you find that you have an IfUy witM'Jfecilder version motherboard, contact 
SGI field service for a replacement boa?d%N 

The Indy uses a sta|id^rd Macmtosh Computer Midi Interface. Because there 
are difference%;||e , tweel|the ! interfaces sold for the Mac, (particularly m the 
voltage levels ftcessa^i; : ho : t:||l ;; Mac Midi Interfaces will work correctly. 
Insufficient testing has ! been dorp to recommend a particular brand. We have 
seen cases where interfaces that do not supply their own power, but instead 
draw their power from JfteTndy serial port will drop midi messages sent 
backjp|ipack. For that reason we do recommend that you purchase a midi 
mte|lace;v^hat has its own power supply. 

-'Mi presentivye are recommending the installation of the DMedia 5.5 
package, which, contains the necessary midi drivers. 

To configure yotir Indy for midi, you can use either of two methods. The first 
method, is to run startmidi. This utility is started from the command line, 
ilUl^arguments specifying which midi ports to rum on. This is the only way 
to°'Hi|||on the internal midi port. 

Alternately, you can turn on midi by using the Serial Port manager, in the 
fSys||m Manager tools. This provides a more user friendly interface, and 
"6nce configured, a serial port will remain configured even after a reboot If 
you find that selecting the System Manager or the Serial Port manager 
generates error messages pertaining to the object server, try the following 
sequence of commands: 

/etc/init .d/cadmin stop 

/etc/init.d/cadmin clean 
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/etc/init .d/cadmin start 



You can verify that your midi is working, by starting r^i||i Daemon with the 
-v (verbose) option. If midi is working, you will get a rne'ssage printed in the 
window for every rrudi message received. 

If you wish to use serial port number .one for receiving midi, it is important 
to rum off automatic spawning of getty's on that port. To do this, you must 
edit the file /etc/inittab. Fmd the line that starts with: 

tl :23:resp awn: / sbin^getty try dl '^^wM>' : 

Change this to: 

tl:23:off:/sbin/getty ttydl ! 

Save the hlg:ans|: reboot the Indy. 
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The sbc Tool 



sbc .,;«^. 

sbc is used to combine any number #3vIIDI sf g^ehges mto a MIDI sequence 
bank (a .sbk file). A sequence bank :i3% i contaJrCs me sequences, one after the 
other {8-byte aligned), with a header afilh%»|ront matltllows indexing mto 
the bank to retrieve individual sequences. : w : : : , 

sbc is invoked as follows: 

sbc -o <output file> fi'feO [ jfi^fel, file2 file3 . . . . ] 
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Chapter 21 



Audio File Formats 



This chapter describes the file foffi|||s used for Nintendo 64 audio 
development. 

The first :| : ection : ' ; d|tails the bank format used by the Sequence Player. The 
second Sption pr^pf elgnformation about the Standard MIDI File format as 
it relates t&Projetl ReaHl§§| 

Note: All multi-byte datatypes (short, long, and so on) are stored with the 
: ! :f||h byte first. This is' the opposite of the Intel ordering found in PCs. 
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Bank Files 



Bank files store the audio and control information needed .|g|g|pL : te audio 
from sequencer MIDI events. On the Nintendo 64, this information is 
encapsulated in two files: the bank fipMid;%e wavetable hie. 

The Bank (.bnk) hie contains contro|;|i)iormfe!ion su||| as program number 
to instrument assignment, key mappMp't#ung, andflhvelope descriptions. 
It is loaded into the Nintendo 64 DRAM cubing plaf back. 

The Wavetable (.tbl) hie conMfe ADPCM compressed audio data. Because 
of the size of the data, it is shparhldmto DRAM (and then to the RCP) only 
when it is needed. v *' 

The formats for both files are optimized for the Nintendo 64 to be efficiently 
used with the SeguehSePlayer and the Sound Player. They are not intended 
to be interchan;|e hie fq|ma^ and contain no textual information or other 
data not direcly relateptC) "pafgng back audio. Many features commonly 
found in standard patch and Wavetable formats (for example, AIFF hies) 
were sacrificed in favor of smaller files in ROM. 

NotfH|§2ferences to objects are stored as offsets in the Bank files, but the 
alBJf£§Neyv() call converts the offsets to pointers. 



ALBankFile 

Bank files must Begin with an ALBankFile structure. This structure allows 
the software to locate data for a specific bank. 

:; ff£§§ef struct { 
sl6rl|%ision; 
sl6b^||kCounc ; 
s 3 2,^i|iikArray [ 1 ] ; 

l|;{;pj : BankFile; 
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The ALBartkFile fields are summarized in Tab 
Table 21-1ALBankFile Structure 



Field 



jpDescnption 



revision 

bankCount 



Fi|e.-;-fOEmat revision number. 
IMumber offlanks contained in the Bank 



bankArray 



Afr% r ;6f offsets of the ALBank structures 
in the bank file. 



ALBank ^ 

The ALBarfc structure specifies the instruments that make up the bank, as 
well as the defaulf ;garn|>le rate and percussion instrument. Banks may 
contain afl^number of programs. 

Note: The percussion .fief a specifies an instrument for the Sequence Player 
tCt;Use as a default MTDT channel 10 (drum channel) mstrument. 

i-tyf||def struct { 
slSinstCount; 
u8 flags; 
u8pad;t;;;s, : 
s32sararjfeRate; 
s 3 2 p er c u1|f> i on ; 
s 3 2 ins t Array [ 1 ] ; 
} ALBank ; 

liable 21-2ALBank Structure 



Beld 



Description 



instCount 

flags 

pad 



Number of programs (instruments) in 
the bank. 

=0 if instArray contains offset, and =1 if 
instArray contains pointers. 

Currently unused byte. 
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Table 21-2ALBank Structure 



Field 



Description 



sampleRate 

percussion 

instArray 



The- Sample rate at which this bank is 
Pihfedt&o be played. 

|The o#|et : 'for'f|lnter) to the default 

ij^elpussion iristr||nent. 

Array;;^|.pffsep(or pointers) to 
ALInstrulrhertt structures that make up 
this bank. 



ALInstrument 



The ALIiistrampit st&ture contains performance information. 

typ ede f s t r |§| t { J? 
u 8 volume; '' ; ji. %o r i 

u8pan; _ ; '0; 

u 8 pr i o r i ty ; /M& r --0M~? 

u8 : -^ercf|ype; 

u^rem^te ; 
ipi'trernDlpth ; 

uStremDelay; 

u8vibType;t^ 

u8vibRate; "^t& 

u8vibDepth; "'' 

u8vibDelay; 
l^iS.bendRange ; 

s'i'agpundCount ; 

s 3 2 lj||indArray [ 1 ] ; 

} AiHris trument ; 



#fBle 21 -3 ALInstrument Structure 



Field 



Description 



volume 



pan 



Overall instrument playback volume. 
0x0 = off, 0x7f = full scale 

Pan position. = left, 64 = center, 127 
right. 
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Table 21-3ALInstrument Structure 



Field 



Description* 



priority 

flags 

bandRange 
soundCount 

soundArray 



The priority for voWfs'for this 
jH^trument. = lowest priority, 10 = 
highest, priority. 

Hfsound^lil^y values are offsets, flags = 
llf. If they 0'i pointers, flags = 1. 

^tGi^feJjfid range in cents. 

Number of sounds in the soundArray 
array. 

lOffsets of (or pointers to) the ALSound 
' objects in the instrument. 



ALSourjti 

The ALSound structure contains information about the individual sounds 
that make up an iristroiitient. 

^t^pedef struct Sound_s { 
' s3l^nvelope; 

s32% ; yMap; 

s3 2way$table; 

u8samp|;^Pan; 

u 8 s amp i'i^p lume ; 

u8 flags '**' 

} ALSound; 

liable 21-4ALSound STructure 



Jifeid 



Description 



'envelope 

keyMap 
wavetable 



Offset of (or pointer to ) the ALEnvelope 
object assigned to the sound. 

Offset of (or pointer to) the ALKeyMap 
object assigned to this sound. 

Offset of (or pointer to) ALWavetable 
objects assigned to the sound. 
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Table 21-4ALSound STructure 



Field 



samplePan 

sampleVolume 

flags 



Description 



Pari position of the souricTih the stereo 
fiffi|«= full left, 0x7f = full right 

lOvera|;;|a^mpl|^olxirae. = off, 0x7 f = 
j&ilLsJpe.. 

If enipjope, kjfptap, and wave table are 
specifi&t|;;t|[jgfisets, flags = 0. If they are 
pointers, flags = 1. 



ALEnvelope 

The ALEnvelogeptf&ture describes the attack-decay-sustain-release 
(ADSR) envelope for a;:|piarid.„ 

Note: Release Volume is assured to be 0. 

typedef struct { .. : ;-=:» : ^... i . i 

s 3 2 att a c kT ime ; 
s 3 2 j|fe1|§yTime ; 
s 3 §\ Y*r e f fp. s e T ime ; 
s|! ; S' a 1 1 ll|cVo 1 ume ,- 
" § 1 6 dec aypo 1 ume ; 
} ALEnvelog-e; 

Table 21 -5ALEnvllope Structure 



.Field 



Description 



attac'kTime 



alfolume 



decayTime 



Time, in microseconds, to ramp from 
zero gain to attackVolume. 

Target volume for the attack segment of 
the envelope. 

Time, in microseconds, to ramp from the 
attackVolume to the decay Volume. 
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Table 21 -5ALEnvelope Structure 



Field 



Description' 5 



decay Volume 



releaseTime 



Target volume for me decay segment of 
the envelope. The sustain loop holds at 
the;.;de i fiay.yo lum e . 

.TJfhe, in rra||oseconds, to ramp to zero 
''Volume. 



AL Key Map 

The ALKeyMap describes how the sound is mapped to the keyboard. It 

allows the sequencer to determine af'what pitch to play a sound, given its 
MIDI key niinabgr .and note on velocity. 

Note: C4:|| consi4&#iife..;be middle C (MIDI note number 60). 

Note: Bank files may not Iftntain keymaps that have overlapping key or 
velocity ranges. i ; 8 ,^..„y- ! " 

Jsy&s&ef struct { 
|jS8 : #2 1 o c i tyMi n ; 
*u8 vilocityMax; 

u8 keyl-lin; 

u8 keyMax,- 

u8 keyBete:e; 

u8 detuneg; 

} AliKeyMap; 

liable 21 -6 ALKeyMap Structure 



Fiid 



Description 



j$pkxityMin 



velocity Max 



keyMin 



Minimum note on velocity for this map. 
= off, 0x7f = full scale. 

maxumum note on velocity for this map. 
= off, 0x7f = full scale. 

Lowest note in this key map. Notes are 
defines as in the MIDI specification. 
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Tab le 21 -6 ALKey Map Structure -| 

Field Description 



keyMax Highest note in this key map. Notes are 

.^■Milksd as in the MIDI specification. 

keyBase v ': The MIDI no%equivalent to the sound 

%Spjaye^l at unity||i tch. 

detune Amount, in cents, to fine-tune this 

.&, sampl^Rlhp'is -50 to +50. 



A LWa vetab I e ' ■?" 

The ALWavetable structure describes the sample data to be played for the 
given sound. Iys^-clef C/ribed in detail below, along with the structures it 
contains. |fL..«^ : - I:i 

enum {AL§fePCM>j#AVE ^||,, 

AL Illwi6 WAVE}; 



/* Must be 8-byte aligned */ 



typedjef 


struct 


{ 


4Pf£ 


order; 




d^sim 


.npredictors; 


,0 si if 


ibook[l] 


; 


$ r "ALADPC®Bpok; 




typedef 


s true t 


{ 


u32 




start 


u32 




end; 


u32 




count 



■:s ! ;;<:;:|| a! ADPCM_STATE state, 
} A^i%)PCMl00p; 

typedef struct { 

020ii3 2 start; 

u32 end; 

u32 count; 
} ALR.awLoop ; 

typedef struct { 

ALADPCMloop *loop; 
ALADPCMBook *book; 
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} ALADPCMWavelnfo; 

typedef struct { 

ALRawLoop *loop; 
} ALRAWWavelnfo; 



typedef struct { 

s32base; 

a321en;*/ 

u8type; 

u8 flags; 

union { 

ALADPCMWavelnfo ac 

ALRAWWavelnfo re 

} wavelnfo; 

} ALWaveTable; 



Table 21-7AEp'at|it)le Structure 



Field 



base 



typ e ||j| : 

flags 
tkavelnfo 



Description 



Offset of (or pointer to) the start of the 
raw or ADPCM compressed wavetable 
in the table (.tbl) file. 

Length, in bytes, of the wavetable. 

the type (AL_ADPCM_WAVE or 
AL_RAW16_WAVE) of the wavetable 

structure. 

If the base field contains an offset, flags 
=0. If it contains a pointer, flags = 1. 

Wavetable type specific information. 



Tjile 21 -8 ALADPCMWavelnfo structure 



"Field 



Description 



loop 



book 



Offset or pointer to the ADPCM-specific 
loop structure. 

Offset or pointer to the ADPCM-specific 
code book. 
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Table 21-9ALRawWaveInfo structure 



Field 



Description 



loop 



..Offset or pointer to the raw sound loop 
structure. 



Table 21-10ALADPCMLoop structured 



Field 



Description 



start 

end 

count 

state 



Sample offset of the loop start point. 

.,.SampIe offset of the loop end point 

Number of times the wavetable is to 
loop, A value of -1 means loop forever. 

ADPCM decoder state information. 



Table 21-11 ACIfePOvfBook snikture 



Field 



Description 



order: 1 ;;] 
vllpTedic 
*fcook 



Order of the ADPCM predictor. 
Number of ADPCM predictors. 
Array of code book data. 



Table 21-12AfiRiwLoop structure 



Field 



Description 



start ' 

end HI 
| count 



Sample offset of loop start point. 
Sample offset of loop end point. 

Number of times the wavetable is to 

loop. A value of -1 means loop forever. 
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ADPCM AIFC Format 



The compressed ADPCM file format is based around A<p<p4t uses a 
non-standard compression type and.two application-specific chunks that 
contain the codebook and loop jpKt'inf ormanon. This file is generated by 
the ADPCM encoding tool fror||standard AIFC and AIFF sample files, and 
is used by the Instrument Compiler t^generafeBank and Table files. 

As in AIFC, chunks are grouped together in a FORM container chunk: 

typedef struct { 

ID ckID; /* *FORM' J^'^l||;> ( ... 

s3 2 ckDataSize; W ^'tfikv 

s3 2 formType; /* 'AIFC * /'^^fy, 

Chunk chunks [ ] " 5i * 

} . f if fe, 

where ckj-D is always FORM and formType is AIFC. The standard AIFC 
chunks, i%ich ar$essen3p|, are the Common chunk, which contains 
inf ormatiori about the sound length; and the Sound data chunk. 

t|fede f s true t { : ^&|j$P s; '' 

:'p3'|:. CkID; / * ' COMM ' * / 
li 3 3||jc kDa taS i z e ; 

s 1 6 :f fiumChanriels ; 

u3 2 niimSampleFrames; 

sl6 sci^ieSize; 

extend ec|ps amp 1 eR a t e ; 

u32 comprWssionType; /* "VAPC */ 

pstring compress ionNaxne; /* "VADPCM -4:1' */ 



The current format accepts only a single channel. The numS ample Frames 
Ipld should be set to the number of samples represented by the compressed 
jlata, not the the number of bytes used. The sampleRate is an 80 bit floating 
point number (see AIFC spec). 

The Sound data chunk contains the compressed data: 

typedef struct { 

u32 ckID; /* 'SSND' */ 

s32 ckDataSize; 

u32 offset; 
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u32 blockSize 
u8 soundData[]; 
} 



Both offset and blockSize are set to zero! 



The encoded file will include two application-speci|l|diunks. The common 
Application Specific data chunk format in A.IFC is: ; 



typedef struct { 

u3 2 ckID; /* 'APPL' */■#' ''^fi-U... 

s 3 2 ckDataSize; '^Sftv- 

u32 applicationSignature; /* x stoW" */ 

u8 data [ ] ; • ^,,,,. : . 

where data [ ]|||onta^p :: me" : 'M|||lication-specific data. 

TheCodebook application-specific data defines a set of predictors that are 

used .i|ihe decodmg of:'th^;^Ompressed ADPCM data. 



i , : ti^p' ed e f ' ' v : : _ s/t r u c t { 

Tfl6 versl||n; /* Should be 01 */ 

si 6 order ;i;$^ 

ul 6 nEntrielSp^/ * ' stoc ' * / 

sl6 tableDatlf ] ; 

} 

''xlli|||rder and nEntries fields together determine the length of the 
tabil||ata field. In the current implementation, order, which defines the 
ADPCM predictor order, must be 2. nEntries can be anything from 1 to 8. 

a ©lfiJehgth of the tableData field is order*nEntries*16 bytes. 

The Loop application-specific data contains information necessary to allow 
the ADPCM decompressor to loop a sound. It has the following structure: 



typedef struct { 

ul6 version; /* Should be 01 */ 

sl6 nLoops; 
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adpcmLoop loopData[]; 
} 



nLoops defines the number of loop points and hence ia&l&timber of 
adpcmLoop structures in the chunM^fhe current library, only one loop 
point can be specified. loopData has the following structure: 



typedef struct { 
ul6 state [16 J; 
s32 start; 
s32 end; 
s32 count; 
} adpcmLoop 



state defines the internal state of the ADPCM decoder at the start of the 

loop and %n : #3<|ssary for smooth playback across the loop point. The start 
and end yilues arfirepresented in samples, count defines the number of 
times th^pop is |pB'e'i||efore the sound completes. Setting count to -1 
indicates that the loop shojld play indefinitely 
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Sequence Banks 



To provide a convenient way of collecting multiple MIE)I sequences and 
accessing them from the ROM, Silicon Graphics has define^linple 
Sequence Bank format. Files of this format :are produced by the Sequence 
Bank Compiler {she), which takes multiple MIDI files and collects them with 
a simple header. S0^im, 

The format for the Sequence Bank file header is: 



typedef struct { 

u 1 6 version; / * Should ift ■"' '%:(: '?% : . 

si 6 seqCounc; ' :i *'' : ->@-,.. 

ALSeqData s eqAr r ay [ ] ; '' : * «i-f 

} 

where seqCour^lls the r||imber of sequences in the hie, and the s eqAr ray 
gives a list of offsets mt^; : ffie- :i i|^|and lengths for the individual sequences. 



typedef struct { yg-; :f#: ;' 

u8 *gij|set; 

s32v>|eq^en; 

} ,£lJSeqSata 

The offsets represent the position of the start of the sequence from the 
beginning of tfe^file. Note that the start of all sequences are 8-byte aligned 
when the SequHfte Bank Compiler is used. 
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Compressed Midi File Format 



The compressed midi file format is composed of a hea.dggi|$cl up to sixteen 
individual tracks. Each midi channekwill have its own track. If there axe no 
midi events for a particular chan|ie!;!S|;|rack will not be created, and the 
offset to that track will be set t(jjfero. jjlf ||| : ^ 

The compressed midi file header ffia : lMIectionST6 offsets and a division 
value. 

typedef struct { 

u32 track0|fs^|^6] ; 

u32 divis fin ; *%t|; ; 

} ALCMidiHdr; 

The offset is||ti£ed m bytes from the begixung of the file to the begining of 

the track. 0ie division value is taken from the input midi file, 

The formatter the b;: indivil|al tracks is similar to the format used in a 
standard midi file. Each |jjtk consists of a series of events, seperated by 
delta times in ticks, ;Tick|lre specified using variable length numbers, and 
4|<ar event must have a3elta value, even if that value is zero. Midi events axe 
J'Sf tie same format as that used in the standard midi file except as specified 
:! belKjfc 

1 . firare axe no note off s, instead note oris are followed by a variable 
len'gjjh, number that specifies the number of ticks duration. As an 
exarri §j|, a note on of middle C with a velocity of 80 and a duration of 
240 ticks would be expressed by the following sequence of hex bytes: 
0x90 0x3C 0x50 0x81 0x70. Note that when calculating the deltas 
| kl . between events, the duration is not taken into account. 

'■|| Only two types of meta events are supported, tempo events and end of 
Jf track events, and they are both slightly altered. Tempo events are 
W composed of a meta status byte, (OxFF) a subtype byte (0x51) and three 
bytes that contain the new tempo. (Note that the len byte has been 
removed.) The end of track event is composed of only two bytes, a meta 
status byte, (OxFF) and a subtype byte (0x2F). Care should be taken to 
see that the end of track event occurs after all the notes in the track have 
played out their full duration. 
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3. Loops are allowed using a combination of le>6>p start arid loop end 
events. A track can have up to 128 loops which can be nested. Each loop 
withm a track has a unique loop number. The looplftart evenf is 
composed of four bytes; a meta status byte (OxFF), a loop start subtype 
byte (0x2E), a loop number (0-12^:and ;; an end byte (OxFF). A loop end 
event is composed of eight bytejf|a met4|tatus byte (OxFF), a loop end 
subtype byte (0x20), a loop cot|||t byte fpi§B||%current loop count 
(should be the same as the loop ; ^u^||jfte), anqj||)ur bytes that specify 
the number of bytes difference befweejtthe end of the loop end event, 
and the begming of the loop start everff||$£#$ i-lpat if this value is 
calculated before the pattern matching compression takes place, this 
value will have to be adjus!|d.:to compensate for any compression of 
data that takes place between fife Ip op end and the loop start.) The loop 
count value should be a zero to ldo : |pi;prever, otherwise it should be set 
to one less than the number of times me section should repeat, (i.e. to 
hear a section-eight times, you would set the loop count to seven.) 

4. Running status is supported for all events except across meta events 
and a crosSflp op p.dpl§ : ; ] K ;| 

The compressed midi file foni||t uses a system of matching patterns in the 
data, and replacing them: Wi# : markers, instead of repeating the data. When 
constf j||ting tracks, any pattern of data may be replaced by any previous 
tiatp'cla^a with a marker. A pattern marker consists of four bytes. The first 
byte is OxftE. The second two bytes are an unsigned 16 bit value that specifies 
^iKe difference, in bytes, between the begining of the marker, and the 
begining ot if|e pattern. The last byte is the length of the pattern. In order to 
distinguish b§|feeen a data byte of OxFE and a pattern marker's first byte, 
any data byte o'FOxFE will be followed by another byte of OxFE. 

Note: The maximum pattern length is OxFF and the maximum distance 
between the marker and the pattern is OxFDFF. 

Nestiiig of patterns is not supported. If a marker is encountered within a 
repeated pattern, the marker data will be returned to the sequence player, as 
actual midi data. 

Note: Patterns replaced with markers may not contain bytes of value OxFF 
or the current loop count byte of a loop end event. 
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Chapter 22 : %,, :; . 

Nintendo 64 Audio Memory Usage 



The following sections discuss the iiljpry used by the audio system in a 
typical application. Memory requirements, and optimization are discussed 

in detail. .^#E?$p3&. 



NU6-06-0030-001G of October 21, 1996 441 



NINTENDO 64 PROGRAMMING MANUAL DRAFT 



Overview of audio RDRAM usage. 



The amount of RDRAM needed by the audio system is deperjdlht on 
numerous factors. Most importantly, the number of soundf §ting played at 
any given time will determine the size of most buffers. Most buffers must be 
large enough to accommodate the ^forst ca|S scenario. Applications with 
fewer voices will need fewer buff er||Xhe safn^Ie r-a|%and frame rate chosen 
will effect the size of several important buffers. £| 



Audio Buffers 

The majority of memory used by tnes|Mdio, that can be optimized, comes 
from the following buffers: H: Ws> 

• The Sample DMA Buffers. 

• The Command List Buffers. 

• The Audii?Qutpuf Buffers; 

There are several other^uff <pst but the gains obtained by optimizing them 
are lejl, significant. Thife : include: 

• ,;|^Th : f iAudio Thread Stacksize. 
■0 The Synthesizer Update Buffers 

• The Se!j|encer Event Buffers 

In addition to dptimizing the buffers listed above, it is important that several 
other buffers are no larger than they need to be. While you can't optimize 
ISjgjri per se, you should check to make sure that their size is no bigger than 
nee(|;.be. Important buffers of these type include: 

• The Audio Heap, 

• I The Sequence Buffer 

• The Bank Control File Buffer 

• The Reverb Delay Line Buffer 

Because the heap size is dependent on the size of the buffers allocated from 
the heap, it is important to optimize the other buffers first. 
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Sample Rate, Frame Rate, and Other Factors 

In order to determine the size of most of the buffers, you >vi|! ;: need to 
determine several factors first. Most importantly, sample rate and frame rate. 
Higher sample rates will requkeJ^prJ-output buffers, more DMA space, 
and larger command list buffer f|pikewi|#, sjpwer frame rates require larger 
output buffers, more DMA buffer space, and larger command list buffers. 

Note: Audio frame rate can be differei§||from vitpo frame rate. It is possible 
for the audio to be operating at 60 framW|>e#icond, while the graphics are 
operating at 30 frames Iplipecond. 

In addition to the samp#'rateiif|||rame rate, the specific sounds, and how 
they are set up can effect the size anllnumber of DMA buffers, as can the 
individual sequences used. 
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Optimizing Buffer Sizes. 



Audio DMA Buffers 

The first area to try and optimize islihe nui^|Jp;:; : i3|i5MA buffers. These 
buffers are used by the audio synthesizer to store samples from the cartridge 
during creation of the output buffers!' 'irvlpe worst case scenario you will 
need four buffers for every voice you hav&(|^Qg%tg<f ! However, in practice 
you need only a portion of^at. The actual numie'r of buffers you will need 
is very dependent on the sequences and sound effects played. To optimize 
this value, you will need to allocate sufficient buffers to keep from crashing, 
and then play your game for a while/; fethe end of each frame you should 
be calling a routine that frees DMA buffers" that have become stale. (Called 

clear AudioE>M4 in example programs.) In this routine, before 

discarding stalejbu&eis, step through the list of used DMA buffers and count 
how many tr#te are. If:f ou'k^ep track of the maximum value, you can report 
this at the eniioi garrif pla\vW sing your choice of debugging method. The 
following code is an example J|| how to perform this count. 

# i f de f AUD_MEM_P;RQ? : ;f : . v 
,:;;*: : ;x ampDMAcount "=i" s, o"; 

g;; dma P t r = dma State. firstUsed; 
tfw h i 1 e ( dma P t r ) 

ampDMACOunt++ ; 

-; : ;%dmaPtr = (AMDMABuf f er w ) dmaPtr->node .next; 

} %k 

i f ( ampDMAc oun t > ampMaxDMABu £ s ) 
ampMaxDMABuf s = ampDMAcount; 

j::§ ::; :: { .#endif 

Becafffe the number of buffers used can vary slightly, even when playing the 
samjlfhusic and sound effects, it is always a good idea to have a few more 
buffers than you ever found yourself needing. 

In addition to the number of DMA buffers needed, it is helpful to know what 
is the maximum number of DMA's performed in any frame. This number 
will allow you to optimize the number of DMA message buffers you will 
need. Because the size of a message buffer is substantially less than the size 
of a DMA buffer, the result of this optimization is not much. However, it is 
easily performed since there is a variable that reports the number of DMA's 
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done each frame. All you need to do is record its maxiirvumpalue, checking 
it once a frame, and then report that value at t%5;am#rme Jjju report the 
number of DMA buffers used. <^.,.^0 

Another place for optimization is ; :thfe||n|th of the DMA buffers. Longer 
buffers will require fewer buf fersjpnd uieTewer DMA's. Conversely, smaller 
buffers will require more buffeifjjjind more DMJjjji. Generally, the smaller 
buffers, even though more are required, -will use;; jiemory more efficiently 
However, the smaller buffer sizes will also generate more DMA's and for 
that reason are less efficient in terms off r^eejplg time. It is up to the 
developer to decide whaftrade off betweeii memory usage and processing 
time to pick. Optimal bufp^es are probably ones that will handle enough 
samples to process one irame olf|udio. Below, is a table that compares the 
same music played back with varIou|b.uffer sizes. (All other factors were the 
same.) 

Table 22-1 DMA Buffer Length. 



DMABufLength : 


'■° Ma^&MA/Frame 


MaxDMABuffers 


BufLen*MaxBufs 


0x600 


■-12' 


26 


39936 


,,;•■; 0x500 


;;%§#! 2 


30 


38400 


F 11,0x400 


14 


34 


34816 


ll|300 


16 


38 


29184 


0x%) 


17 


43 


27520 


0x20S^ 


22 


50 


25600 



|^s can easily be seen, the amount of buffer space needed goes up as the size 
s ll|the buffers go up, even though fewer buffers are needed. However, at the 

!f|rie time, the number of DMA's goes down. In this case, probably the value 
iff 0x500 is optimal, since it causes the least number of DMA's per frame in 
*the worse case situation, but allows the memory allocated to buffers to be 

smaller than it would be with buffers of 0x600 size. 

Another constant that can be changed is FRAME_LAG. This value defines 
how long a DMA buffer will be kept after it has been used. If you continually 
use the same sample, that sample will be kept in memory, and will not need 
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to be DMA'ed again. Higher lag values will loweijjie number of|MA's but 
will increase the number of DMA buffers needed. V: -S { S'MM< x; ' M. 



CommandListSize 

Like the number of DMA buffers, the command, list size is dependent on the 
sequences and sound effects used by the game. To optimize the command 
list size, simply record the maximum value used, and check that value at the 
end of game play. Because this can vary, eveh4^|ii : £>layrng the same audio, 
it is wise to leave a little moiiihan you ever needed. 



Output Buffer Size 

The output buffer:Mz;e;;is determined by the audio playback rate, and the 
frame rate. If ypi syncKaucJip to the vertical retrace you will need to have 
three audio oiljgut buJfe¥sVMp§u synch the audio to the audio completion 
interrupt, you Will only need tflftave two output buffers. Example code is 
included in the example appljlanons demonstrating calculating the size of 
the output buffers. 



Audio Thread Stacks ize 

The audio tn|||ad stacksize can be determined using the stack tool, and 
optimized accordingly. 



I Synthesizer Update Buffers and Sequencer Event Buffers 

Syntrfpizer update buffers and sequencer event buffers are allocated from 
the $ptlio heap when the synthesizer and sequencer are created. There is, at 
i present, no way to efficiently optimize these values. However, because the 
size of each buffer is small, it is better to allocate a few too many, than not 
enough. 
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TheAudioHeap 

Once all calls to alHeapAl 1 oc have been completed, y.o.U;gfp determine the 
amount of the heap that has been used by subtractmgf milleap's current 
value from the heap's base value^^Tlaese^yalues are part of the heap structure. 



TheSequenceBuffer 

The sequence buffer newels to be large eni£igi : : : to hold the largest sequence 
that will be used. ' : '§:i#w 



The Bank Control File Buffer 

The bank cqproTfile buffer needs to be large enough to hold the bank control 
file. This lithe <rJSto*ctl file. 
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Chapter 23 

Using The Audio Tools 



This chapter instructs the musiciah<:&nd sound designer in how to use the 
audio development tools currently available for the Nintendo 64. It is 
divided in to, ttig:: following sections: 

• An o>||xview p; j^e^audio system. 

• Discission of the constraints and decisions that should be made in 
conjunction with the ||rogrammer or game designer. 

• : :, :: Suggestions for .cheating samples. 

• Playback parameters and the .inst file. 

• How to create bank files. 

• MIDI files and MIDI implementation. 

• Musi ^development tools. 
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Overview of Audio System 



In order for the musician or sound designer to produce soigyjjj^d music 
for the Nintendo64, a short explanation .of the audio system itffelpful, 

though not necessary. To that end, a b,gel -'Inscription of the audio system is 
included here. (The audio system %Siscussed in greater detail in the 
programmers documentation.) In acl||tion|o : abrief inscription of the audio 
system, several important items the mis;ic#\ shoul||e aware of are listed 
below, : K; U _ 

Brief description of audio system 

The audio system for the Nintendo ^|&£pmposed of a Sound Player (for 
playing single samples, such as sound llfets) and a Sequence Player (for 
playing music). When the game starts up, it creates and initializes a sound 
player and a sequence flayer. It then assigns a bank of sound effects to the 
sound player, (Sid assi||s: : .a; : :b ; ank of instruments and a bank of MIDI 
sequences to ^sequence pla||r. To play a sound effect, the game sends a 
message to the sound player, tefing it what sound effect to set as its target, 
and then sends another messd|e to the sound player, telling it to play the 
targeMpund. To play a MIDI sequence, the game must load the sequence 
datajB&n attach the sequence to the sequence player, and then send a 
mflsagi^o the sequence player to start playing the music. 

"Vote: MiSjcal sequences can be stored as either type MIDI hies, or in a 
compresselllidi format unique to the Nintendo64. It is very important that 
the programiS§|and the musician agree on which file format to use. 

There are several components to the sound system. First, there are the 

SiljSgles that are stored in ROM. Accompanying the samples are a group of 
parameters used for playback (Key Mappings, Envelopes, Root Pitch, and so 
on). ||oider to process the sounds, a section of the RAM must be allocated 
for $ie audio system. 

In software, there are two main sections. One part runs on the CPU and the 
other part runs on the RSP. The audio system must share the RSP with the 
graphics processing. The RSP is where most of the low-level processing 
takes place, and this is where the samples are mixed into an output stream. 
This output stream is then fed to a pair of DACs for stereo output. 
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There are four types of files used by the gaml||| :: aud#rociuction: .ctl, -tbl, 
.seq, and .sbk. Before the game can play back eite'lound ejects or music, 
the musician and sound designer must create these hies. .The , tbl files contam 
the compressed samples. The .bnk ffe contain the assofifted control 
information necessary for playback, .bnk files and .tbl files are always 
paired. |;| MMX$k> 

The .seq files are MIDI files that have all unneedj|| events removed, and the 

.sbk files are banks of .seq files. Typica|| therepill be at least one pair of 
.bnk and .tbl files for music, and a seperate pair for sound effects. (Although 
it would be possible to |fi|pjl sounds into one pair, or alternatively have 
numerous pairs.) J|f "' ::::;:;: ||:|- r _. 

The reason that banks are stored in two files is that then the raw audio data 
doesn't need tq.be loaded into RAM; only the information pointing to the 
samples, and the values for the playback parameters. When a sound is to be 
played, only a smaH portion of the sample is loaded into a RAM buffer. After 
it has beelused for playback, it can be discarded, and the buffer reused for 
the next portion of the sample. The result is that a comparatively small 
amount of RAM is needgfjior sound. 



Typical Development Process 

When%eating audio for an Nintendo 64 game, the musician typically 
foUows%ese steps: 

1. Creatlithe samples as AIFF files. 

2. Encode the samples into AIFC files. 
§1, Create a .inst file. 

|S Compile the .inst file, with the samples into the bank files. 
PI. Create the MIDI sequence files. 

6. Compile the MIDI sequence files into .seq files, and then compile the 
.seq files into a .sbk file. 

7. Deliver the .tbl .bnk and .sbk files to the programmer. 
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Common Values . JJ ill 

Throughout this document and when referring to .inst files, several things 
are kept constant: ....^ ■***»#»>> 

• Middle C (MIDI note 60) is referred to '"%. C4. (Some synthesizer and 
software manufactures refer to Middle C as C3.) 

• Pan values range from to 127, wlffittfeeing full left, 64 center pan, and 
127 full right. Ik Jf 

• Volumes are from to 127, with meaning 'there will be no sound, and 
127 being full volume. |^SSjl, ;: 
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Dealing With Constraints and Allocating Resources 



When you use the Nintendo 64 system, there are severj^gfjilices that you 
must make. Most of these choices center around howifflse the fewest 
system resources, while still maihtfeiiiig a sufficient level of quality. 
Unconstrained by limits on available resources, the Nintendo 64 system is 
capable of audio rivaling top-^|he-lin^laihp|ijs. 

Most of the limits in the software system are easily changed. However, in 
most cases a great deakpf time can be si^e##the programmer, game 
designer, and musiciari|||lliagTee beforehand what these values are going to 
be set to. 

The limits on resources will fall into several categories: 

• determinjjig^hardware playback rate 

• limits -0 voiced, and processing time 

• division, of sounds aryf.music into banks 

• limits of" ROM space jjp 

fEJelermining Hardware Playback Rate 

The principle decision to make about software is deciding what playback 
rate the^ardware should be set to. Typically, rates from 22050 Hz to 
44100 Hzvile chosen. Higher rates require the software to produce more 
samples, ana consequently take more processing time. Although there are 
no hard rules to follow, values of 44100 Hz are ideal, but values of 32000 Hz 
yand 22050 Hz do not produce a substantial loss of audio quality. Values 
•Jf low 22050 Hz quickly begin to degrade the quality of the audio. 

Jpi'so of considerable importance is the fact that samples sound better if the 
IpButput rate is as close as possible to their sample rate. If all the samples in 
the game are sampled at 22050 Hz, the output quality will be best with a 
playback rate of 22050 Hz. If there is uncertainty in the planning process, it 
is better to start with a higher rate, and resample down later, than to start 
with a lower rate and resample up later. 
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Limits of Voices and Processing Time i 

The factor limiting the number of voices available for playbagkjjphe amount 
of time the audio will have for processing. Obviously, the more voices, the 
more processing time needed, and thg^glier the audio playback rate, the 
more time needed. As a rough guicjllfne, it i^esfejm^ted that 1% of RSP time 
is needed for each voice, when plajfUg at4J|ik: "'Sft^flthe audio is given 20% 
of RSP processing time, then fifteen "teMfelrtty voicesjvvill be possible. 
However, if the audio is given 40% of processing tiffte, then 30 to 40 voices 
will be possible. Remember, that a lower ou^u|:glayback rate reduces 
processing time, thus incre^^g 5 the number of voices available for playback. 



Division of Sounds and M u sic inljet Banks 

There are no foRr^tules specifying how the sounds and music will be 
organized. However, '^mp.st cases it is best to organize the sound effect 
samples mto'||t>ank (op;r)a®s|;sseparate from the music samples. 

There are two ways that the sequences may be stored in the game. They may 
be stored as separate sequences, or they may be compiled into a .sbkfile. The 
rnusi^ -Samples and MIDI files should be organized so that each sequence (or, 
if U;#cl/each bank of MIDI files) has a corresponding bank of music samples. 
Msainples are shared by different MIDI files, they should be stored in the 
i: iame bank.;; If the sequences do not share the same sample bank, duplicates 
of the samples will be produced in the different bank files. 



Limits of ROM 

The;imount of space available for audio is strictly up to the game developer. 
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Creating Samples 



Creating samples for the Nintendo 64 is similar to crea^g§llmples for any 
sample player. However, there are several additional ficts'to keep in mind. 

To be recognized by the ADPQpXools, ^gjSagyptes should be stored as AIFF 
files, oruncompressedAIFC. ||| . M? "'' "% tk 

Samples benefit from bemg sampled afsthe saxj# sample rate as the output 
playback sample rate. Because all sampltl^^-compressed with a variation 
of ADPCM, when they' ; ||||Dlayed back at rates significantly different from 
their sampled rate,- the noise can become rather obvious. 

As an example, if the output sampfelate is set to 44100 Hz, but the sample 
is sampled at only 22050 Hz, then to playback the sample at its original pitch, 
the sample, con^ter must create two samples from each sample. Worse, if 
the sampliits to B§J|>l|ved an octave below its original pitch, the sample 
converter!|nust crpll&'f ©^samples for each sample. Because of the noise 
and distoripn mtSbducedplom ADPCM, this will not be nearly as good 
quality as it would be if safEples were recorded at 44100 Hz, or if the output 
playback rate were;chatiged to 22050 Hz. For this reason, you may want to 
jl^feriple all samples to match the output sample rate, before performing the 
§pf BJ|CM conversion 

Samples may be looped at any location in the sample. Although many 
ADPCK^iSystems require you to loop samples at specific boundaries (the 
Super Nrh||ndo, for example, required that loop points be multiples of 16), 
the Nintenuf) 64 makes no such requirement. If a sound is looped, it will loop 
as long as the sound is playing. When a looped sound's envelope enters the 
^release phase, then the sound will still continue to loop. 

j||| looped samples should last until the next multiple of 16, after the loop 
: 0d. (This is because the ADPCM encoding stores the samples in blocks of 
(ft.) For this reason, it is prudent to leave at least 16 samples after the loop 
end, on any sample that loops. As a nice feature, the adpcm tools provided 
have an option that truncates any sample to the shortest viable length, so 
there is no benefit to the musician calculating and truncating looped 
samples. 

In other words, when creating looped samples, find your loop points, and 
don't worry about the release portion of the sample. If you want to truncate 
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the sample, to keep samples on your hard disk sifter, b.ut : aiways leave at 
least 16 samples after the loop end. Then when you encode the samples, 
make sure you use the -t option, and the samples will be automatically 
truncated for you. ,.„»,..,,. 
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Playback Parameters and .inst Files '%^..,*JP % 

This section contains information about how to create tjj^jpl file- 
Setting Sample Parameters in the .inst File 

In order for the Nintendo 64 audio%ifem to plagiack samples correctly, it 
must have information for controlling %ecte,g|ph as pitch and volume. 
These parameters are setihy creating and editing a .inst file. Although some 
discussion of parameters-follows, it is highly recommended that you review 
an example .inst file, becfusP IfJlnv of the parameters will be much clearer 
then. ' !>i Wp:h. : 

The .inst file is. : a collection of objects, defined by text using C language 

syntax. Th^0jM^^re: 

• envelfjjes ^W m ^§ki, 

• keymapf' 

• A sounds ,Mzi^0 } ~ 

• instruments 
_! • hgnks 

The orJij|s are related as follows: The basic unit representing a sample is a 
sound. fl||sound has an associated keymap, which specifies the velocity 
-fo. range, key Itnge, and tuning of the sample. Also, the sound has an 

jjf associated envelope that specifies the ADSR used to control the sample's 

^cHti^yolume. Sounds can be grouped into an instrument. Instruments are then 
M^ m ^kzpuped in to a b ank. Currently; there is only one bank in a .inst file . Bee ause 
program control ch anges are limited to value s from 1 to 128, MIDI sequences 
e|n only use the first 128 instruments in a bank. Game applications can select 
higher values by calls to the audio API. 

Differences Between Sound Player and Sequence Player 
'^Itjfv Use of .inst Files 

flll^ The sound player and sequence player use the bank files created from the 

. ; ' i !|^ .inst files in different ways. While the sequence player uses the bank to 
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identify instruments, and then uses the keymapsl|||.^enti|^%hiMi sound to 
play for what MIDI notes, the sound player does hdhejol-this. Tjfff sound 
player does not use the bank structure, the instrument struGtii^or the 
keymap parameters. However, for the .inst file to compile, every .inst file 
must have a bank and an instrument^^sifi^yery sound must point to a 
keymap. This keymap may be shartf^fby allfte^O.W) 1 ^ 3 m ^ e ■ mst ^ e > so 
only one keymap is needed. 4&'~" ^Ifll 

For these reasons, the example .inst sound||||ects M0 are set up with one 
bank, with only one instrument, that lists the sounds in sequential order. 
There is no concern for overlapping of keymaps in this case, because the 
sound player ignores them. However, there is one default keymap, in order 
to allow the file to compile, m orderi|||i|he pitch of a sound effect to be 
altered from its recorded pitch, the application must set the pitch, not the 
.inst file. 



Envelopes 

The Nintendo 64 audio system: Supports the use of ADSR envelopes for 
controlling volume. Envelope time values are in microseconds. (Because 
microseconds are a much finer control than most synthesizers and samplers 
use,: musicians will have to adjust their thinking to accommodate much 
JgEger numbers than are usually used by samplers. Remember, an 
attackTime of 100,000 will produce an attack of one tenth of a second.) 
Maximum volume values are 127. In order to avoid any pops or clicks at the 
ends of soundS|;you should always end an envelope with a release volume 
of zero. This is particularly true in the case of looped samples. 

l§S|en using the sound player to play sound effects, if the decay time is set to 
-l/fNIgi the envelope will never enter the release phase. (In other words, it 
will loop forever.) To stop the sound, the game will have to call alSndpStopQ. 



Keymaps and Velocity Zones 

Note: Keymaps are used only by the sequence player. They are ignored by 
the sound player. 

In addition to an envelope, every sample has a keymap. This keymap 
defines what keys and velocities the sample will respond to. By using 
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different keymap settings, it is possible to create instruments that play 
different samples for different keys and velocilfepp' 

In the keymap object, you set the minimum and maximum velocity values, 
as well as the minimum and maxfriii|l|t,keys to respond. Note that you 
cannot create overlapping keycap zon§s,,When the sequence player is 
trying to map a note to be played, it will search through the possible 

keymaps, and when it finds on#$|$tf|tt:an use/ffwill not continue to search. 

Note: The Nintendo 64 imposes an upj^k^if on the keyMax value of one 
octave more than the k|||l|ase. 

Tuning tor Samples Recorded at the Hardware Playback 
Rate 

In additi$|f'to mei|elccity and key zone information contained in the 
keymap |tructure>fai '^atiples have a keyBase and a detune value. The 
keyBase slfS the Sample'S;:ptch in semitones, and the detune value is used 
to fine-tune the sample igpents. (A cent is 1/100 th of a semitone.) If the 
sample rate of the sound matches the hardware playback rate, the keyBase 
i||::|h-e MIDI note value of the sample's original pitch. If the sample rate does 
:i1 :?:;5yho|; ; match the hardware playback rate, the keyBase must be altered to 
Jpf condensate for the difference in rates. 

As an sample, if a note of F4 is recorded at 44100, and the playback rate is 
also 44lfli|||hen the keybase should be set to 65 (since 65 is equivalent to 
Jj^ MIDI noteT4) and the detune is set to zero. 

^|jj ningforSampiesRecordedatVaryingRates 

One of the more complicated aspects of the .inst files is the tuning of samples 
a^ that are not sampled at the same rate as the hardware output rate. 

(remember that the hardware output rate is determined by software, and can 
be changed). Although the sample rate will be extracted from the AIFF file, 
% Mf: |^ you must adjust the keyBase parameter and the detune parameter if you 

want the sample to play back at the correct pitch. 

Illll^ In order to calculate keyBase and detune from a given sample rate, use the 

following formula: 
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N = semitones to add to keybase i§| 

N= 121og2(HardwareRate/SampleRate) '^itlP f 

A much easier way to deal with the tunjag i: ,issue is to use TSml'16-1. In this 

case, pick an acceptable rate from the^QMi^that corresponds to your 
hardware rate. Record your sample ||that rat|.|^: ^sample your sample at 
that rate), and then add the numberifjseir^fpes ffifhe leftmost column to 
the MIDI note value of the samples pltcM&hce thaflms method insures a 
value of zero for the detune. 

As an example, suppose that '^puhad a hardware playback rate of 44100, but 
you wished to critically resar^.pie^|sample of a trumpet playing Bb4 to a 
sample rate of about 32000 fll. Inst@ift?|f using 32000, you would res ample 
to a rate of 33037, and then in your .inst'i|j|; you would add 5 semitones to 
the midivalue. Since Bb4 is the same as MIDI note number 70, you would 
add 5 and your keylase value would be 75. 

Table 23-1 Tiailng to : hpdw1§© playback rates. 



Add to MID! Value ' 


Hardware Playback 
Rateof 44i00 


Hardware Playback 
Rate of 32000 


Hardware Playback 
Rate of 22050 


serfiilines 


44100 ' 


32U00 


22050 


l.sprti tones, 


41624.857 


30203.978 


20812.429 


2 semi tones' %^ 


39288.633 


28508.759 


19644.317 


3 semitones 


s. 37083.532 


26908.685 


18541.766 


4 semitones 


35002.193 


25398.417 


17501.097 


igjspnitones 


33037.671 


23972.913 


16518.836 


6 semitones 


31183.409 


22627.417 


15591.705 


7 serpigines 


29433.219 


21357.438 


14716.609 


1f*§lmitones 


27781.259 


20158.737 


13890.626 


9 semitones 


26222.017 


19027.314 


13111.008 


10 semitones 


24750.288 


17959.393 


12375.144 
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Table 23-1 (continued) Tuning to hardw^rp playback rates. 

Add to MID! Value Hardware Playback Hardware Playback Hardware Playback 
Rate of 441 00 Rate of 32000 ^g$f$te of 22050 

11 semitones 23361.161 i; :;>|Bfflf;|||51.4lO 11680.581 

12 semitones 22050 ■ ll§#S||&, 11025 



To extend the above table, or produce^table w|m a different hardware 
playback rate, use the Jpllowing formula: 

Sample Rate = S l!lff«: v . 

Hardware Rate = W '^Wt%^ 

Number of semitones to add td'HflDI value = N 

H 



S = 



jpitl 



|§||unds 

A sc||nd structure is simply a reference to the sample, the keymap, the 
enveilf^e, a value for pan, and a value for volume. Pan values are in the 
range 5|0. to 127, with equal to full left, 64 equal to center pan, and 127 
equal to : S|l|- right. Volumes are specified by values of to 127. 



instruments 

•Jfie instrument structure is a list of sounds grouped into an instrument. If 
pie instrument is a musical instrument to be used by the sequence player, it 
■Is limited to 128 sounds, since that is the maximum number of MIDI notes. 
However, if the instrument is for use by the sound player, it may have as 
many sounds in it as you like. In addition to the list of sounds, the 
instrument has an overall volume and pan. (The sound player ignores these 
volume and pan values. Instead the sound player uses the pan and volume 
values specified in the sound object.) 
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The instrument structure can be used to create Df^„Kit& Jft ft\$|pse, you 
create an instrument that uses multiple sounds and'asioMted ke|maps. 
(There is a good example of this in the General MIDI Bank pfffifP with the 
developer's package.) 



Banks 



At the top level of the .inst file is the bank Dfucture. If .mst file may contain 
as many banks as needed. TJt&bank must bltliiled by the application, 
since there is currently no way to switch banks via MIDI. 



Creating Bank Files **& 

The process forireanng^ample bank files is as follows: 

1. Record th^amples and save as .AIFF files. 

2. Encode the samples usmgjjlbledesign and vadpcm_enc. 

3. Create the .inst file. fl^tP' 

4. ^iSic^^ipile the bank using ic. 
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MID! Flies 



Sequences can be stored in the game in one of two way^fpher as MIDI file 
Type 0, or in a compressed MIDI fi^jgrmat. To use MIDI "Type 0, save the file 
as either a Type or Type 1 MIDI file., and then use midicvt. To use the 
compressed sequence format |ive the §le#S;:;:eifher a Type or Type 1 MIDI 
file, and then use midicomp. : ft;., : ..._. : J§f r ' '""'111 

The process for creating MIDI sequeff|| ( bani;J(!es is as follows: 

1. Create the sequences and save them "W "MIDI files of either Type or 
Type 1. 

2. Convert the sequences usift|>;erj.ther midicvt or midicomp. 

3. Compile the sequences using sBft' 

The following "Mpg)I messages are supported by both file formats: 

• Notef|n ; tp^' ! *^;%> 

* Note off j§| 
H . Polyphonic ke'y^fJ^Sure 

W 'MiMidi Controllers: 

%,. Controller 7: Channel volume 

■ | Controller 10: Channel Pan 

■ (K&ntroller 64: Sustain 

■ Controller 91: FXMix 

lb » Program Control changes 0-127. 
I|| Pitch Bend Change 

fori addition to the above MIDI messages, the MIDI file meta tempo event is 
supported. 

Loops in the sequences. 

The way loops are implemented in the two sequence formats are very 
different. If a game uses MIDI Type format, the loops must be created by 
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the programmer using audio library calls from l|||hin the||am%<spde. If the 
compressed sequence type is used, loops are mse^iS;.hl ; tne musician. This 
is done using midi controllers. 

The compressed sequence format supjpsr|S)k>oping within tracks. A track 
can have as many as 128 loops, whicfican bfeBec^ggitial or nested. Each loop 
is numbered, and must have a loofptert ag|;;i : ldci|;'::end. Optionally it can 
have a loop count, that specifies the r: h^rh|l|r of tim<|||the looped section 
should play Loop counts are limited fromf||to 255. ^Jloop count of zero, the 
default, will loop forever. ^ 

Although the format used iiflllif ;fe impressed midi file is not detailed here, it 
should be noted that when a file Is^p^pressed, midi events are rearranged 
into tracks based on channel All midi events for channel 1 are put in the first 
track, and all midi events for channel 2 are put in the second track, and 
so on. This is pajt||i$$rly important when considering loops. If a loop is put 
in a track, all rhldi events from that channel will loop. 

To insert loopSffito a°COmpresg|d midi sequence, you will need to insert 
extra controllers. These controlfers serve as markers for the loop. A loop start 
is defined as a controller number 102. A loop end is defined as a controller 
103,4^|hin a channel, each loop start and loop end pair must have a unique 
nurhbe|:between and 127. This number is what the loop start and loop end 
cpitroUe'f.'s value should be set to. A loop count between and 127 is created 
lf$ith a controller 104, using values to 127. A loop count between 128 and 
255 is create!: using controller 105, with values to 127. (When a loop count 
controller lu$|s r: encountered, the value is added to 128 to produce loop 
counts from l2S* to 255.) 

:g|ts. : a simple example, consider the following sequence: 

: ' :: f:|:toop C start {controller 102 with value 0) 

i||bp count of 6 (controller 104 with value 6) 

., ; #Sop end {controller 103 with value 0) 

In this case the section between the loop start and the loop end will be played 
six times. 

It is important to understand that the loop count is not associated with a start 
and end pair. When a loop end is encountered, it uses the most recent loop 
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count, even if there has already been a looped., for another ;loop. Consider 
the following sequence; '"'gapv'" Jp 

loop start (controller 102 with valu&i00 r 

loop count of 8 (controll,eij||rt.C4 with value 8) 

loop end (contro|lfer 10.|.( wi^h value 0) 

loop 1 start (contro§|er lgp'wTffif^alue 1) 

loop 1 end (controil' : e:rrf : l : §3 with lvalue 1) 

In this case, the first loogiloop 0) will ha^e; : |;||o ! p count of 8. The second loop 
(loop 1) will also have a|||§to count of 8, since once set, the loop count 
continues until change d. If there has never been a loop count in the 
sequence, the loop courilis set atl||.default of 0, which is interpretted as loop 
forever. 

Warning: Ail loops must have a loop start and a loop end with at least 
one valid mid i event in between. 

Nesting Loops. "* W» 

Injhe compact sequenofformat it is easy to nest loops. Consider the 
following sequence: 

: Cpop stare (controller 102 with value 0) 

fifop 1 start (controller 102 with, value 1) 

lo0|%. count of 8 (controller 104 with value 8} 

loopl||| ! end (controller 103 with value 1) 

loop 2 ; vstart (controller 102 with value 2) 

loop 2 end (controller 103 with value 2) 

gv ; loop 3 start (controller 102 with value 3) 

! '*1'| loop count of 4 (controller 104 with value 4) 

Jg- loop 3 end (controller 103 with value 3) 

£■&-" loop forever (controller 104 with value 0) 

loop end (controller 103 with value 0) 

In this case loop 1 will loop eight times, before the sequence proceeds to loop 
2, which will also loop eight times. After that, loop 3 will loop 4 times, and 
then the entire sequence will loop infinitely. 
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Putting Things Together Into Makefiles^li| % 

In the developer's kit, there is a directory named viper tha|;;sho|v$ how files 
would be arranged to build a bank of music samples. The makefile in this 
directory shows examples of setting up rules^for files, and dependencies m 
a logical order. When you start a project, youj;|ar};:;!^.. these files as a template. 
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General MIDI and the Nintendo 64 % % 

Although the Nintendo 64 is not specifically a General MIDI device, it can be 
configured as one. As part of the developer's kit, there is a General MIDI 
Bank that demonstrates this. AIL the sound files used in this bank are also 
provided and may be used by licensed developers in any Nintendo 64 
project. 



Currently, MIDI channel 10 is configured to default to program 128. In the 
General Midi Bank, thi$,is the Standard Drum Kit. If you send a program 
change on channel 10, tS^uecified program will be selected, and channel 10 
will no longer be the Standard Drum Kit. 
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Chapter 24 |^ 

Scheduling Audio and Graphics 



The Nintendo64 audio and graphicsij|ores are shared between the host CPU 
and the RCP. The work to be performed is expressed using an array of 
primitives c^0$ : ^command list. 

The host fp|U is re¥p6hsil||e for command list generation. Audio command 
lists are gehf rated ; by callif|| alAudioFrame(). Graphics command lists are 
generated by calling the various graphics macros defined in gbi.h. In 
addition, the host CPU 15 responsible for assembling command lists into 
RCP tasks (which consist of command lists, RCP microcode and execution 

vStafemformation), and for downloading the task at the appropriate time to 

"the rfa 

The RCP/; is responsible for command list processing. The RCP microcode 
loaded by|h;e host CPU parses the command list, executes the appropriate 
core rendering routines, and writes the results to the video frame or audio 
buffer. 

*||nce the video frame buffer must be updated at a regular rate (usually 30 
fr|mes per second) and the audio buffers must be updated before they are 
jjffiptied by the audio DAC to prevent clicks and pops, the application must 
pnake schedule the command list generation and processing chores so that 
they happen in a "timely manner". This chapter identifies the relevant 
scheduling issues and describes the libultra Scheduler that addresses them. 
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Scheduling Issues 



Command List Generation ; ^ & 

Command lists are usually generatei: ! durm|;;j^;|i^me before they are to be 
processed. Though command list gefL^atioJpnourd|1^ke less than a frame 
time to complete, there are mfrequent : %^||ions wheJHt may take longer. 
When the host CPU misses its completion ||adlin j e|fa£ overran is said to 
have occurred. .§,,„, ' : '^<m' : ~ [ ~ 

The effects of host overruns, ^leiifeajly undesirable. If an audio command 
list is not ready to be processed dunl||W|e next frame time, clicks and pops 
will be introduced into the audio stream.. If a graphics command list is not 
ready to be processed, the video frame buffer will not be updated until the 
following frame|w8ch,may cause the graphics stream to appear "jerky". 

The effects oflJ|st overruns' : W$0ie audio stream can be minimized if the 
audio and graplScs command||§ts are generated in separate threads. 
Specifically, if the audio mre§Jjfuns at a higher priority than the graphics 
thread^the host CPU cah'ScJieciule the audio task even though the graphics 
taskj|iiy not be completely generated, preventing clicks and pops from 
bepg introduced into the audio stream. 

1 Alternately, ;:pne could implement a dynamic buffering scheme to prevent 
overrun by dynamically varying the audio data buffer size to accommodate 
any graphics 'overrun. This approach would require somewhat larger 
buffers and is more difficult to implement since overrun is dependent on 
things that are not known until runtime. 

Note: Calls to alAudioFrameQ generate DMA requests, which are assumed 
to be'^mplete when the audio command list is processed. The DMA latency 
depjfpis on the operation of the audio DMA callback which is implemented 
by the application. 



Command List Processing 

While audio command list processing time is deterministic (based on the 
number of active voices), the graphics command list processing time is 
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variable (based on the complexity of the sceimand the?peri|i£ctive of the 
viewer). Unless great care is taken in the coi\sMjcfpf of the: graphics 
command lists, they may require more than a frame tirj|eJ^|>roeess. This is 
call graphics (RCP) overrun. ^.^ **«*§»<" 



The effects of graphics overrun can be minimized by suspending the 
overrunning task and running the waiting audio task at the beginning of a 
video frame. Graphics tasks can be suspended i^jth the osSpTaskYieldQ 
function. See the osSpTaskYield man. Jf§||es fbjJifore information. 
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Using the Scheduler 



The Scheduler is a host CPU thread that addresses the issue^djpiissed 
above. It is responsible for executing audio and graphics tails On the RCP 
such that host and RCP overrun is rruhimM£;d. or eliminated. 

Each video retrace, the Scheduler refls thejpW ta§i|: generated by client 
threads from the task queue and adds^fcrdf o the en%of a real-time (audio) 
or non-real-time (graphics) task schedule M||. 

If the previous frame's graphics task has overrun, the Scheduler causes the 
task to yield. It then runs the next audio task, resuming the yielded task 
when the audio task has completely processed, and any additional graphics 
tasks that are to be run to be run in the current frame. 

When a task compiles* .the Scheduler sends a message to the client 
indicating maffpe wdr||it.r|*quested is complete. 



Creating the Scheduler: osCreateScheduler() 

In o$g§§|to use the Scheduler, you must first call osCreateScheduler() to 
initiilizeJthe OSSched data structure, its message queues and the Vi 
Minagenlj|he osCreateSchedulerQ function spawns a thread to schedule 
and manage : . task execution. One of the parameters to this call is the thread 
priority, whi% should be higher than that of the threads which generate the 
command Kst&b 



Adding Clients to the Scheduler: osScAddClientQ 

The S||ieduler instantiates the Vi Manager and receives all retrace messages. 
Ho$Jper, clients of the Scheduler can receive a copy of the retrace message 
by providing a message queue when they sign in. This is accomplished by 
calling the osScAddClientQ function. 

Note: One of the parameters to this call is the message queue on which you 
wish to receive retrace messages. Make sure that the queue is big enough if 
you don't want to lose messages, as the Scheduler does not block when the 
queue is full. 
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Creating Scheduler Tasks: The OSSlijlask jjjructure 

In order to send tasks to the Scheduler for execution, YQ^fpist first create 
and initialize an OSScTask structure^Jhe structure and't'Sescription of its 

fields is listed below. jf i? : ' : * }fts |||. 

typedef struct OSScTask_s |;| \ W; : W'^Wik . 

struct OSScTas k_s *iie%4 ; g:|?' 
s32 staBI? ; ' f }$[ 

u32flags; ''|j ; , 

vo i d * f r amebu f f erg!^ , ""' :t f #S ■-. [ 

OSTask list; Mf |;s, : : : >. . 

OSMe s gQueue *ms gef; ' i:; (i<?:% : -,._ 

OSMesg msg; ' ;;; -ll;fe 
} OSScTask; 

Table 24-10SScTask:siructure fields 



Field 



next 

frame b tiff er 
list *% 



Description 



Not used by client (used by the 

scheduler for list management). 

Not used by client (used by the 
scheduler for state management). 

Address of the frame buffer for this task 
(if it is a graphics task). 

Structure containing task code and 
command list data (described below). 

The message queue on which the client is 
to receive the task done message. 

The message that the client is to receive 

when the task in done. 
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Table 24-20STask structure fields 



Field 



type 
flags 

ucode_boot 

ucode boot sizdl 



ucode size 



ucbefedata 



fcesdfiptipn 



ucode_data_size 



||ask ty§#srisu§d ; ,be initialized to 

ifcAlfpTASK folaudio tasks or 

M_GFXTASK fojfraphics tasks. 

VariouPistgpfe bits; should be 
initialized to for audio tasks, or 
. OS_TASK_DP_WAIT foT most graphics 
tasks 

PoiriS'r to boot microcode; should be 
initialized to rspbootTextStart. 

Pointer to boot microcode size in bytes; 
should be initialized to 
({u32)rspbootTextEnd - 

(u32)rspbootTextStart). 

Pointer to task microcode. Should be set 
to one of gspFast3DTextStart, 
gspFast3D_dramTextStart / 
gspLine3DTextStart, or 
gspLine3D_dramTextStart for graphics 
tasks; otherwise aspMainTextStart for 
audio tasks, 

Size of microcode; should be initialized 
to SPJJCODE_SIZE. 

Pointer to task microcode. Should be set 
to one of gspFast3DDataStart, 
gspFast3D_dramDataStart, 
gspLine3DDataStart, or 
gspLine3D_dramDataStart for graphics 
tasks; otherwise aspMainDataStart for 
audio tasks. 

Size of microcode data; should be 
initialized to SP_UCODE_DATA_SIZE. 
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Table 24-20STask structure fields 



Field 



dram stack 



dram stack size 



output_buff 



output_buff_size 



■datajjjtr 



Description 



data size 



lUginter to DRAM matrix stack; should 
'Reinitialized to for audio tasks and to 

meiriO|H^gion of size 

SFJDRAM^STACK_SIZE8 bytes. 

iffRAM ma-fpc stack size m bytes; should 
b^^lnitialiiea to for audio tasks or 
SP!WPKi_STACK_S12:E8 for graphics 

tasks. 

Pointer to output buffer. The "_dram" 
llfeersions of the graphics microcode will 
"I bute the SP output to DRAM rather 

than to the DP. When this microcode is 
used, this should point to a memory- 
region to which the SP will write the DP 
command list. 

Pointer to store output buffer length. The 
SP will write the size of the DP 

command list in bytes to this location. 

SP command list pointer. For graphics 
tasks, this is the application constructed 
display list. For audio tasks, this 
command list is created by 
al Au dioFrame(3P) . 

Length of SP command list m bytes. 
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Table 24-20STask structure fields 



Field 



Description 



yield_data_ptr 



Pointer to buffer to storS'saved state of 
^P^ldiiig task. If the application is going 
to supJpprt^reemption of graphics tasks, 
the g^aphicf :|||ks should have this 
CStruGtfire merrier set. This should point 
to aiihemory region of size 
OSjflELD JDATA_SIZE bytes. If task 
preemp#@ff te not supported by the 
application, this field be initialized to 0. 
Audio tasks should always set this field 



yield_data_size 



SizHfof yield buffer in bytes. When task 
yielding is to be supported by the 
application, this should be initialized to 
OS_YIELD_D ATA.SIZE for the graphics 
task. This should always be for audio 
tasks. 



Note: Refer to the osSpTafkLoad man page for information about the 
aijg|prt|p.t restrictions of the data pointers. 



Sending Tasks to the Scheduler: osScGetTaskQ() 

Once you havSlcfeated and initialized a Scheduler task, you can send it to 
the Scheduler thread via the Scheduler's task queue. You can obtain a 
lllpjnter to this queue by calling osScGetTaskQ(). 

The'JIheduler will read this task queue after the next retrace message from 
the ^Manager. Normally, you will send one audio and one graphics task to 
ijtj^pcheduler each frame. 

Note: After you send the task to the Scheduler, you should not modify it 
until you receive the "done" message. 
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Chapter 25 



GameShop Debugger 



This chapter describes the game deb^g environment for the Nintendo 
Nintendo 64 system. It briefly explains the hardware and software 
environment^: : i|lti§trates recommended programming model, tells you how 
to get star|pl witnllh^debug environment, and introduces you to the most 

common|||used d|fiu£g£|:fearures. 

Hardware Environment 

: .;ig!For;:|he development system, the ROM on the game cartridge is replaced by 
;. RAM on the development board; in this chapter, we refer to it as "virtual 
;"W ROM;'';. : ;This allows the game developer to load the game program into 

memo^cpntrol its execution, and observe the effects of modifying the game 

without having to rebuild from source. 

The development board plugs into the GIO bus of the workstation. Audio 
J|F ^^|^-|^ and video output connections are provided. Communication facilities 
! %|;.. ff : ^||iween the workstation (referred to as the host in the rest of this chapter) 

Is; l J aili the development board (called the target) are via the RAM devices that 

emulate the cartridge ROM and several registers provided for handshaking 
I!,. and synchronization. 

" r Software Environment 

:: S| : fe The software debug environment consists of a number of software modules 

||^. that must be present to support debugging. Some of these will also be 

present in the final game system, but many will not. A good understanding 
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of the software architecture will enable the gam%develojN|r to :t3eal with 
unexpected situations that arise during a debug ptg<S;|ssibn. 

At the highest level, the debugger consists of two major parts. On the 
development host, a graphically oneriteipource-level debugger called gvd 
is provided. In the target system, ajpnall m>§ircuit debug monitor called 
rmon acts as the agent for gvd. Th|||peratof ;6f : 'm%|iebugger sees only gvd, 
but requests are actually fulfilled bjNfi^olitjChat is, you may open a window 
on the host for the purpose of looking at i&emory extents. The host cannot 
access such memory directly, but it can asItpnfTjpe) fetch the memory 
contents from the target so''ft||yiie host can display them, rmon runs as three 
threads under the OS, but these threads spend most of their time either 
blocked (awaiting a host ri|uesti n li|i|p.pped. Thus, they do not interfere 
with the operation of the game (otherthan. taking up some memory) unless 
they are processing debugging commands under operator control. 

Like the OS ar|ct : omer;j§brary routines, rmon is included in a build only if the 
game develojpr specifi£afiyla§ks for it. This is done by creating a thread with 
rmonMain specified' Is the fuif§tion to be started when that thread is run. 
The rmon program is part of Jf ultra, the Nintendo 64 run-time library You 
do not need to have anyspefial files to include rmon in a build. Referencing 
rmralfvfain automatically includes all code and data for all three of rmon's 
thrpc% 

\0n the host side, the main program you see is gvd, the debugger. However, 
there are a dumber of support programs that run in conjunction with the 
debugger. S'isfee gvd is designed to work in other environments as well, it 
uses a separatlt§5rogram called dbgif (for debugger interface) to 
communicate with the target environment. Only dbgif knows the actual 

tmeans of communication with the target system; gvd is independent of such 
concerns. 

Sincjgwe wish to share the GIO interface between the host and target with 
•^|lpt "programs (for example, diagnostics), a third module is provided on the 
'"fiost. This is a device driver built into the UNIX kernel, and- functions as the 
target manager. When any program (such as dbgif) wishes to communicate 
with the target, it issues requests to the u64 device driver. In this way, it is 
possible for two pairs of programs running on the host and target to 
communicate through a single channel without interference. 
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Rmon Theory of Operation 

As mentioned in the previous section, rmon consist of t^re : f |Hreads that run 
under the operating system, but these threads run very infrequently. The 
rmon main thread consists of a ceipiifad parser, a command dispatcher, 
and a collection of service routines. m'cj| : eration, the debugger sends a 
request to the target. This requjit consJfMf"' if umber of 32~bi! words that 
describe the work to be done; for example, "read 40 words starting at 
address 0x10000000 in the address sp||e of thr||d 6." 

Note: All threads run injp^same address space m this environment, but the 
debugger could supports : rj|ci|e complex environment where this was not 
the case. The debugger Sbes ccij||er the RCP to be a separate address space 
internally. 

This requesfeiisgassed through dbgif to the driver The host (through 
operatior>:;p the d||ver) alerts the target that it wishes to send a message. A 
very sma|| high-i^rir^lhread. called the rmon IO thread responds to the 
mtermptiifat occtir wherphe driver writes to one of the GIO registers. Only 
one access to the "virtual ROM" is allowed at a time, so the host must wait 
until any DMA access ^progress is completed. 

pvljin this has happened, the target notifies the host that it is now possible 
to life, the memory. At this point, the target system starts a high-priority 
systelibthread (the rmon spin thread) that keeps the game from running and 
startinjpgjiy more accesses to virtual ROM. Since the game is not accessing 
this meiicfty, the host is now free to load the request packet into a 
predetermined location at the high end of memory. When the packet has 
been deposited in memory, the host notifies the target that a request has 

s arrived. This stops the rmon spin thread. The rmon IO thread notifies the 

llnain rmon thread and waits for the next interrupt. 

Jlrte rmon main thread wakes up in response to the message from the rmon 
lib thread. It fetches the incoming packet and dispatches a service routine 
based on what service was requested. In our example, rmonReadMem will 
be called. This function examines the arguments, reads the memory, and 
deposits the contents in another section of virtual ROM as part of a reply 
packet. It then sends an interrupt to the host, alerting it to the arrival of the 
reply packet in memory. The host responds to this interrupt by copying the 
reply packet out of virtual ROM and sending another interrupt to the target. 
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This provides feedback to the target that the hoslfeatingned w|lh the reply 
buffer and the target may use it again. 

Most transactions between the host and target follow this model, but there 
are a few exceptions. It is likely mat:;:ffie : ta:r|et will asynchronously send a 
packet to the host that is not a repp: jo a hos|,ri|ie|t. This occurs whenever 
a breakpoint has been encounterea|ifr, example. rl|ih host and target "sign 
on" when starting, and each has a reply that it sends, to the other when such 
a sign- on is received. The debugger can allo^prpc^lS notification that a 
thread has been created an||giestroyed. WlM§#©t currently used, these may 
be added in the future. 

Target-generated interrupts are rec&v&^by the driver on the host system 
and routed to processes (for example, fi§1|rf) that have registered that they 
would like to rece|y,e a given set of interrupts. (Interrupts are associated with 
a six-bit value. pehtS|yrng which interrupt occurred.) Thus, rmon sends a 
specific interrupt codfeitejti^biost. This code indicates that the message 
should be seiftyto dhgpahaT'r|^ some other process. The driver does not read 
the communicition buffers e||ept as an agent for dbgif or another 
application process. 



Programming Model 

'While a glrne may use any programming style desired by its author (s), there 
are cert am res frictions imposed by the debugger. Those developers who 
want to use trtp^debugger must conform to the rules of the programming 
model to obtain the benefits of source-level debugging. This section 

^discusses the restrictions that apply. 

Th^iSaost obvious requirement is that you must use the OS, since the 
debjllger depends on it. It will not work under an OS of your own design, 
b|^|ise it is designed for the Nintendo 64 OS. ' 

Use of the debugger also requires that you restrict thread priorities to a 
specific range. User threads (those that are part of the game) are assigned the 
range 1 through 127, with 127 being the highest-priority thread. The OS does 
not prevent you from assigning thread priorities higher than 127, but you 
will be unable to debug them. In fact, use of priorities in this range may 
prevent the debugger from working at all. While the OS does not impose any 
restrictions on the idlethread (other than the requirement that there be one), 
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the debugger requires that the idlethread bl|^signed|j?riorlty level zero. It 
is not sufficient that it be the lowest priority t8|^m%e syftem: it must be 
zero. Otherwise, the debugger may attempt to suspend ||^pch will lock up 
the system. The rmon main thread should be set to prlSlfty 
OS_PRIORITY_RMON. j|lll|v 

The boot procedure for the sys|§|n is dgprme s l|ilsewhere, but some parts of 
it are repeated here because a r^yle^llhelpfulljach application has a boot 
function, which is called at startup (after security checking, of course). The 
boot function initialize?! the °P eratm g sfl|t}f§ihd then creates and starts the 
main thread. The boot f§|||edure may also do other things, such as hardware 
initialization, if desked;p€^ ;; also create other threads, but starling a thread 
is always the last thm^lne b8wjj||pcedure does. The reason for this is 
simple; once control is hansferred : -!||: : a: thread, there is no way to get back to 
the boot procedure. To enable as much debugging of your start-up code as 
possible, the:::b<^| : procedure should be minimal — probably just the three 
function c|Is that :§re : required to start the main thread. 

The mamftreadstarts oiler threads within the system, including the 
debugger thread. There islpore flexibility here, although the ability to debug 
system startup is signifieshtly better if the recommended model is followed. 
M$k. recommended model is for the main thread to create all other threads in 
. the system, start only the rmon thread(s), and then lower its own priority 
.j;|p if anol|ficome the idle thread. Again, you don't have to do this, but debugging 
will Ijferk much better if you do. 

Clearly you can't debug any code that comes before starting the debugger 
(rmon) thread. It is also the case that you can't really debug code that has 
already executed by the time the debugger starts up. This is not so much a 
,^3|g:|rg, : ^ function of time as it is of the traditional approach used in debugging 

embedded systems like the Nintendo 64. That is, if you want to watch the 
System start from inside the debugger then you can't really start running the 
application. Since the debugger is just another thread under the OS, it does 

^ .:gp|||iiot keep your application from running off and executing the game 

■; _ '-'■" application. Some debuggers may "hold off" the application until the 

J;I%§|;^. debugger is ready; this one doesn't. 

''''^III^. Of course, this does not mean that you can't debug the startup of your 

.|fe x application. It just means you must bring up your system in a stopped state 

«^ and start it running from within the debugger. To do this, your code should 

■ ' ' ■ start only two threads (although it can create as many as it wants, since 
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creating a thread does not cause it to run). The two threads are the rmon 
thread, which is considered to be only one thread lotehbw, and the idle 
thread. Comment out or conditionally compile m the osStart^p-ead calls for 
other threads so that they do not run until .told to do so. Running a thread 
from the debugger is exactly like cajfpg"V3$|:$artThread. 

What happens if you don't follow t|§s. projpdure arS you start all the 
threads in your system? Unfortunately^ iftmost cas|fthe debugger will be 
harder to start, since it needs a stopped thf||d : to xfpnect to. The idle thread 
and the debugger threads will be running, butit is likely that all your 
application threads will be Sfc|f||ed on some event. Since the OS now allows 
waiting threads to be stopped, you may brmg up the application m a 
running state, use the multithread vflf^"|o stop the thread to which you will 
attach, and then use Switch Thread to Solfhect. 



Using the Debugger 

Once you haveilll the requirepisoftware installed on your system, you can 

modify your application to include rmon. Since rmon is rather passive, it 
doe&mpt require you tolKtine debugger. It just waits for incoming requests 
an#$otes not interfere with the game operation unless requests arrive. An 
i|||Iudetile, rmon.h, is provided as part of the distribution. It should be 
•pcludetlhy the file that creates and starts the rmon thread. 

Once you hava| built your application, you are ready to debug it. 

1. Start dbgir in a window of its own. 

2. Download your application with gload. 

3. You may now start gvd itself. 

E|r the Nintendo 64, it is required that gvd be started with the name of 
your executable (the boot executable, if there is more than one) on the 
■. command line. For example, if your executable is named sample, you 
would enter: 

gvd sample fc 

The debugger starts. It makes no attempt to contact the target system 
yet. 
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You should have a source window and : #small st|j£is #jfidow (which 
may be minimized if desired). Now you rr^itjeltabhsh^Link to the 
target. ^^10" 

4. Select the Admin pulldown rneit$u : and click Switch Thread. 

You will be prompted for th^TD of!l|ig,|hr,ead to which you wish to 
connect. Under the OS, thr|l|rls dQ^itl'elll^have small integer ID's; 
instead, they are referenced' ;: i:f!Zp%:address ||| their thread control 
blocks. When you created the thr||d irutiajjl you assigned it an ID for 
the debugger to use. 

5. Specify the ID you a£si|ned to the thread to which you will be 
attaching. j#' ''^f;^. : 

You may only attach to a threli||hat is in a stopped state. If you start 
the application with all threads stopped as recommended above, you 
will not-h^|.:any problems attaching. 

Once yoiftiave sufleSsrtitly attached, the host and target will communicate 
to pass inl^pnatiili aboulijie system state back and forth. This takes a few 
seconds, or even longer if Jp u have many threads. Once completed, you may 
bring up other views aslappropriate to your debug session. Open views by 
Sheeting the Views pulldown menu and then clicking on the view you wish 
p5 Hjje. The most frequently used of these are: 

• Agister view 

Thftjis where you may examine or modify the contents of all R4300 
regisi§|p! (except for some system control registers). Note that these 
registers apply to the thread to which you are currently attached. 
Switching threads with this view open refreshes it with the register 
contents for the new thread. You can only examine and modify the 
!|k registers of a thread that is stopped. 

||§ memory view 

§?•' As you would expect, this is where you examine and modify memory 
contents. You may specify the window origin by address or symbol. 
This window has two modes. In single-word mode, it displays and 
modifies exactly one memory word without touching any other 
locations. This is the mode you would use for dealing with 
memory-mapped registers. In block mode, it displays a block of 
memory from the specified starting address. The size of the block is 
mostly determined by the size of the window on your screen. 
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Stretching the window gives you more meni|||f to loojjat. Steinking it 
gives you less. You may specify the base in which yotilwish memory to 
be displayed. ^^::00' 

• disassembly view „-MM : ^. 

This view shows you memory g|pents' !: ||. 3 4i|l^sembled code based on 
the current PC value, or else di||#semb|p'llBli|pme address you 
specify The source line correspoSlinfp the dis||sembled memory is 
also displayed. There are a number of |txnfigur^f on options for this 
window that let you customize it to thS display that you find most 
useful. 'ilfeb. 

• trap manager ^11%-., 

This view shows you all breakpointSj^tat are set. Breakpoints also show 
up in the source and disassembly win'lows as pink lines. The current 
PC shows ur|fsi;-: : green line. 

The source vi jj|, wmchis the main view of gvd, consists of a set of control 
buttons for rurljihg aM stoppfc the selected thread, plus two other 
windows. The source wmdow|flie middle portion of the view) displays the 
source at the current PC ■(lp|gfault) / and tracks the program counter to keep 
it opjtfeen whenever possible. You may set breakpoints here by clicking in 
th|Ipar|in to the left of the line at which you wish to set the breakpoint. 

}; 'Whe bottoftof the source view is a small command line window where you 
may enter 'cf|nmands and see the results. The mouse cursor must be in this 
window to ui||jt, This window is usually used to examine data objects like 
structures. For Ixample, if you wish to look at a message queue called 
audioMQ, you can enter print audioMQ, and the contents of the structure 

Mte4uding all its members) will be printed. Since the compiler and debugger 
weridesigned to work together, the debugger has quite good type 
infoiiarion for displaying complex structures like this. 

; If vou plan to use this window much, it is probably a good idea to move the 
^tebugger higher on the screen and stretch the bottom down to enlarge the 
command portion of the view. The default size is a bit small. This window 
accepts most dbx commands, for those of you familiar with this popular 
UNIX debugger. 

The command window is also useful for setting breakpoints in functions 
that are not on screen because they axe in a different source file. While you 
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can always change source files and set a bre^^Qint . ||ls metre convenient 
(providing you wish to stop at the start of a function) to use|jhe "stop in" 
command. If you know that you are trying to isolate a^plipm in a function 
called sendDisplayList, then it is probably best to type stop in 
sendDisplayList in the command window, then click Continue. This 
will run your application until fgiy thread enters the specified function. 

Note: Encountering a breakpoint stops all threads with priorities in the user 
range (1 through 127). In general, coprof essqrsiperrupts are blocked while 
rmon is running, and QgU interrupts afeishabled. 

The Admin pulldown menu -aiso^ontains a few other useful items. First, this 
is how you exit the debugger. YM||nay also change to a different executable 
here, but you should then do ano$e$$>witch Tfiread command. There is a 
multithread yigw in this menu, which is useful to have opened if you use 
more than one thread. It allows you to start and stop threads as a group, and 
mdicates#hemer':|; : giye.n thread is running or stopped. If stopped, it shows 
you whicllfunctiQiM W^executing. It also shows you the name of the 
thread datlWtructure used: In thread system calls. 

You will probably fjhd-gvd to be fairly intuitive, especially if you have used 
ti§S|jr source level debuggers. The online help should answer most questions 
rthaf%rise in debugger operation. 
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NINTENDO ^_^_ DRAFT PERFORMANCE TUNING GUIDE 

Chapter 26 

Performance Tuning Guide •" •■. :V . . .,•;:',' • "H 



The following sections will discuss j 

• Data Reduction 

• Geometry Tuning 

• Rasterffan'irt^i 

• CPUJitmng||; ■ 
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Data Reduction 



Game Wo rldOrganization 

The most important performance tuning technique in graphics is "to discard 
as much geometry as possible before animation computation and rendering. 
Depending on your game, you can orgjjKize trl|k|«|l§|try in several ways 
that enable rapid culling of large quanfi|jes of i|aiaVcw|;£xamplfi is a simple 
grid of fixed-sized regions: 

Figure 26-1 Fixed Size Grid Database Organizati&t^„. 
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You could also build a hierarchy of di£ferent-siz^pMs^|;^e you a 
quadtree; 

Figure 26-2 Quadtrees ^^% :m! i]0- 



You can extendi ISfS*o 3D and get either a fixed size cube organization or 

octrees. Keeplh mind fiat you are trying to eliminate work; not just graphics 
rendering bu|||so telfire lolgfe and animation processing such as collision 
detection. 

Theffid need not be repla¥ either, you could also use other boundaries if it 
suitlfour data. One example of this is a "portal connectivity" organization 
iriide of a building. In a building with rooms and hallways, the possible list 
of thmgsthat you can see can be represented by a portal connectivity 

description;, which lists which rooms of the building are possibly visible. 
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You can further reject more data by testing a list of ^Gpeh"-pr||e<:ted portal 
rectangles against visibility to determine whether to conside^c^in a 
particular room or hallway 

Figure 26-3 Portals Connectivity Visibility ''-"mmff"-' 
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HierarchicalCulling 

Throwing away geometry to eliminate processing. does nqtfiav^to stop at 
the top level. A common organization at the obje<^:i£yei;p :;:; a bounding 
volume test to eliminate objects (see gSPCullDisplayLisiO). ._. ^M' 

Figure 26-4 Bounding Sphere Test „^, 
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Geometry Tuning (gspFast3D - Precise Microcode) 



The standard gspFast3D microcode contains very p^^|e sjJpixel^y 
calculations for antialiasing and precise s,t calculatiohlfilfaxge sc||en area 
textures. This precision is required for terrain or backgroundj^|gons that 
are large. .^sw^ 

This microcode is full featured, inclu«|g ligh^pii^ping, texture 
coordinategeneration(reflechon mapping) . ;: Jg;? 



VertexGrouping ^ 

The geometry microcode has a;;|ocli|^rtex cache. Loading a block of 
vertexes can amortize the cosfof per vStfe* calculations (transformation, 
lighting, texture coordinate computation)'? 1 *??' 

Careful organizati$pe|||ie database can minimize these calculations. In 
general, it is bes|!o loadf|§yer : tex cache with as many vertices as possible, 
then draw all trfteome.|^wr!l^uses those vertices. 



Pre Lighting -<llli r 

For ; h|n-d|riamic lighting effects, lighting computations can be calculated at 
motel time?then rendered with simple Gouraud shading. 



Clipping and%ighting 

Xb,|s microcode does not have enough instruction space to hold lighting and 
clipping code. It swaps them in from the dram using a least recently used 
algorltfln. Since lighting occurs during vertex load and clipping occurs 
during polygon drawing, there axe natural blocks of work following each 
ucodej&ad. Loading just a few vertices and then drawing a small number of 
triangles will cause this microcode loading to "thrash". 
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Note: We have not seen performance degradatio^ : ;:alfe;:%:this swap in any 
games . Game developers did not realize that thjf was happ|rung until we 
told them. Large block DMA transfers (such as mjcrocodf |oaMs) are very 
efficient. l f€ 



Kinds of Polygons 

The cost of geometric processing u||lie RS jjpiist||:; ; below in the order of 
decreasingpe rf ormance . 

• Flat Shade (using gDPSetPnmCotoi{3P) tctplect the color) 

• Gouraud Shade % S§; K% _ 

• Gouraud Shade + Z- buffer 

• Gouraud Shade + Texture 

• Gouraud Shade + Z-buffer + Texturing 

Textures instead of Geometry 

When possible, use texture|j§f represent complex geometry. The RCF is 
desired to draw high-quality textured prrmitives. Achieving complexity 
by: : iismg additional geometry will always be slower than using textures. 

Geometric Level of Detail 

When objects get far away or have rapid animation, you can render it with 
less detail without noticeable loss of detail. 
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Geometry Tuning (Turbo Microcode) 



The gspTnrbo3D microcode is a feature-limited, precl|i?n-rech|ged, ; || 
optimized version of the 3D polygon microcode. It uses. a complete||} 
different display list organization that is more efficient, but les|^|ptal. 

Because of the reduced precision, the turbo microcode is not suitable for 
drawing backgrounds or objects with pjjcise te||^p||It_is designed to draw 
"characters", objects that generally ren§|n inJ|$te3S|^of the viewing 

frustum. Jf 

The following features are not supported with the turbo microcode: 

• clipping 

• dynamic lighting :v ^lf|&, 

• perspective-corrected textures **& 

• matrix sta^fep^l^^ 

• antialiasgtjig (antijt|||j§|f£ supp orted, but not as well) . 

Current performance measurement of this microcode are >5K polygons per 
frame @ 60 Hz. For more iniprma&n, consult the man page for gspTurbo3D 
(3P). ,£.; <&&&' 

ThisMicrocode is in it's first release and may change. 
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Raster Tuning (Fillrate) 



Disable Atomic Primitives 

Atomic primitive mode (gPipelmeMode{GJPM_lPRIMTiii|j ? ls intended 
to avoid span buffer coherency problems which can be caused by sucessive 
primitives with overlapping spans Jfirrng ^a^-modify-write" modes 
(z-buffered or blended modes). The 1 PRIMITIVE mode inserts a delay into 
the pipeline between each pnminv^i^milce sure tfjjf re are no overlaps. 

In reality, the overlap case is very rare, and ::: #6uld;be hard to see unless you 
were looking for it. In the wlfetcase, the lost cycles between primitives can 
add up to about l-1.5Mpixe!s/sec of lost fillrate. 

To disable the atomic primitive mode/lf§|§|the command 
gPipe]ineMode(G_PM_NPRIMITIVE). " w 



Partial Sorflng for Z-Buffer 

A "partial sorting" of objectsjiing drawn can accelerate rendering when 
using, z-buffering. Thezsh-glfer test is a conditional write, so if objects are 
drajfltin roughly front-to-back order, this test will often prevent the write to 
uff : a"te:the z -buffer value. 



No Z-Buffer 

Z-buffer cause! "major penalty in fillrate. Antialiasing also causes some 
performance loss in fillrate. We have included a simple performance tool 
^lockmonkey) in the release to give you a feel for geometry and fillrate 

performance. 

There-are many visibility sorting algorithms available and even more 
Ihpbrids of these algorithms. There are also properties of particular games 
that impart valuable information about depth order. If a game can use these 
techniques and avoid z-buffering, performance will improve. 
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Convex Objects : - ' 1 • 

If a group of objects are all convex, a centroid or bonding volume sort and 
back-face rejection will give the proper rendering or§^fe;$ggF 

Meshed Objects ^^ 

Many meshed objects have a small nu||per of i^sfeteversal orders which 
are correct sorts at arbitrary orientatiolfgeven ; Jrp tignlf^y are concave. 
Meshed object are topologically 2D, for iiphple, a tonf|ga terrain height 
held, building corridors, etc. With one batch cifyertex. jfints, one of several 
polygon descriptor display lists could be selectiEiiy^iew location. For 
example, the polygons in a terrlf|||mesh might have four orders across the 
mesh, S+T+, S-T+, S+T-, S-T-. T0TW%:s|des of the mesh then closest to the 
vie w p oint se lee t th e or der. ♦ S : %? v , 

Ceil Based Scenes fSrttHfc 

Cells are simply ajpgher l^yei Qlmesh, where the cell draw order can be 
determine d f rorrt|ie w. _;W y 

Layered Scenes ^.. : j}:F 

Often l^yefs of data are known never to be behind another (buildings on a 
landsilpe, furniture in a room), then the layers can be drawn in this order, 
witflbnly a sort within each layer. 

Bucket Sort 

Attractive since data need only be accessed once. A linked list of buckets can 
ayoidJocal overflow without excessive memory usage, the bucket can be a 
; display list, for example, of calls to clumps. 

Avoid Cyclic Objects 

Clumps of polygons in which NO sort order is correct (three long triangles 
arranged in a triangle in which at each corner a different triangle is in front) 
have no visibility solution without subdivision. 
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Game-Specific Visibility 



Many game situations provide implied visibility order bejwee% objects or 
even within objects. Consider a jet fighter flight Si^K^^%airie| : ;The player 
is always moving "forward" (in general) and targets Mack froi* a limited 
number of directions. This could allow you to model the targets carefully 
and achieve correct surface visibmtv. determination, even if they are not 
strictly convex. glf" "" : -||: 



No Antialiasing **""! 

Turning off antialiasing cal||||glp increase fillrafE To minimize the aliasing 
effects, you can increase the horizontal resolution of the framebuffer. 
Performance tests (blockmohkey) s|^w that 512x240 "no AA no ZB" is faster 
than 320x240 "AA no ZB" on large poisons. In some cases, this is better 
than a 25% gain, in exchange for an increase in framebuffer size. 

On smaller polygons, you will pay a 5% to 10% fixed overhead due to 
additional vijeo ban Jipii^Bofh antialiasing and dither filter video 
hardware require fetching 3 s?canlines and filter down to produce a single 
scanlrne of video. ;# 



Reduced Aliasing 

Reduced Aliasing refers to a blender mode (see the G_RM_RA* macros in 
gbi.h) in which the color and the pixel coverage are only written instead of 
the normal read /modify /write cycle. In this mode silouette edges will be 
antialiased, btSilnternal edges of an object will not be antialiased. This 
mode works with and without z-buffering. 

Silouettes can also have artifacts in this mode when displayed on top of a 
sur£|£e which has edges through it, such as a tesselated background, which 
has also been rendered in this mode. This is because the edges in the 
■background will be partial, rather than fully covered. In this case, the pixel 
will have multiple partial fragments, and the antialiasing on the silouette 
will look wrong. A possible workaround for this problem is to render the 
background in non-antialiased mode, which will write full coverage to the 
framebuffer. Then render the foreground characters using this reduced 
antialiasing mode. 
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Parallel Execution of the CPU and the RCP' 9;l %g|r' 

Full speed rendering in the Nintendo64 can. only be accomplisffiaby fully 
utilizing all of it's resources. One of the g#§^^|f erful is the coarse-grain 
parallelism that can be achieved betwe|J the Cll^a^cl the R CP 

There are many ways you can exploit mls^af flleHsm, ffere are some ideas: 

• compute game and animation pararnelfe.ipjjirSnie (n+1) while 
frame (n) is rendered Worth the RCP '-^fe^'' 

• compute game and anirj^ti^nararneters while another RCF task 
is computing; If your game includes several RCP tasks per frame, 
you can pipeline them so the CPtT : ift$ the RCP are always busy at 
the same time. 

• instruct thgll^i<5,render from a DRAM display list while the RSF 
is used tO;!|bmputi;||)§^er task, such as audio. 



Sorting ,»^, i5 «# f 

A detaMfcT^nalysis of sorting algorithms is beyond the scope of this 
document, ij\e reader is referred to texts by Knuth 1 or Sedgewick 2 , among 
others'. It is useful to review major properties of sorting algorithm analysis 
and see how they relate to real-time system performance. 

Properties of sortihg : algorithms which we want to compare include: 

• best case sorting time 
;|p': ;: " :: * : ' : ;|c;worst case sorting time 

• average case sorting time 



1 KnatK : 'B! i E:--. : T/ie Art of Computer Programming, Volume 3: Searching and Sorting, Addison-Wesley Publishing, 

1#3, ISBnI|j|G1-03803-X. 

2 Sedgewick, R., Mg^hms in C, Addison-Wesley Publishing, 1990, ISBN; 0-201-51425-7. 
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• additional memory requirements 

• size of the code to implement || %;; : : , 

• ability to exploit coherence. '^Ify-^igrp ip 

The time to sort is probably the most important; obvious W We~ want to 
choose an algorithm that is fast. But itiSWotsthat easy. Some of the fastest 
sorting algorithms have the widest diSpant^etween their average time and 
their worst-case time. This makes i|;:difficul|i;|(> :: 'p*eBi.ct performance 
necessary for a real-time system. 1|| 

Often the difference between worst-averaj^b^st^ase performance is the 
initial order of the data. By Itnowing what we are sorting (and why) we can 
choose a better sort. For example, if we are sorting Z- values m order to 
determine visibility drawing : orderlfive can reason that this order vanes only 
slightly from frame to frame (objects do not move "dramatically" and sort 
interchanges, are local). By exploiting thifframe to frame coherence, we can 
choose a sort wij^|inear performance for the "already nearly sorted" case, 
speeding up our sort tremendously. 

Additional memory requirements are also a major concern in an embedded 
system. They must be minimal}? and most of all, predictable. Consider the 
sorting problem when designing your data structures. 
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Symbols 

.aiff file 374 

.bnk file 426 

xtl file 373, 378, 402, 447, 451 

.inst file 76, 397, 449, 451, 457, 458, 459, 462 

.sbk file 423, 451, 454 

.seq file 451 

.sym file 402 

.tbl file 402, 426,451 

/usr/sbin 31 

/usr/src/PR 30 

/usr/src/PR/assets 30 

/usr/src/PR/conv 31 

/usr/src/PR/libultra 31 

/usr/src/PR/relnotes 30 

clearAudioDMA 444 

_gsDPLoadTextureBlock_4b 262 

Numerics 

0x0 122, 139 

0x80000400 120 

1/w 184, 186 j 

3D transformations 63 & 

4Dgifts 70 
64-bit, R4300 46 

9-bit RDRAM 318 

A A b: 

AA_EN 337 . J||. 

a-buffer 340 
accuracy, z 325 

acti ve page register 5 8 ; 

ADD render mode 344, 345 
address 47 

ADPCM 369, 373, 385, 401, 402, 405, 412, 413, 414, 426, 43* 
455 ^ x ^ 

ADPCM decoder 437 
ADPCM decompressor 436 

ADPCM predictor 436 Jlf ^■■^1?3\, 

ADPCM tools 455 '""It. 

ADSR 406, 430, 457, 458 
AI 48, 86, 95, 102, 111, 114 '**$ 

AIFC 76, 412, 413, 435, 451, 455 a ,,,.:'~ : '" 

AIFC spec 435 

AIFF 76, 374, 405, 412|§tl%, 1 426, 435, 451, 455, 462 
AIFF file 459 

AIFF-C 405 " :? '" ^il- , 

AL_FX_CUSTOM 388 
AL FX ECHO 391 



AL_FX_SMALLROOM 392 j 
alAudioFrame 65, 372, 382, 383, 
ALBank 427 

ALBankFile 373, 377, 426 
alBnkfNew 373, 378, 426 
ALCSeq 376 
alCSeqGetLoc 377 
alCSeqNew 376, 377^ ; . ... . 
alCSeqN ewM arker 7 ,: " ; : ;; 
alCSeqNextEventJ3j?7 W i: :0C<.n 

alCSeq SecToTfck|(|77 JP""' m 
alCSeqSetLoc 37lWgmM 
alCSeqTicksToSec 377 

alCSPDelete 379 ; ,„ , ;d 

alCSPGetChlFXMix 380 ^l||lp 
alCSPGctChlPan 379 
alCSPGeteMilirferaty 380 
alCSPGeM5hlPr6g1^ 380 
alCSPGetChlVol 38%; ; 
alCSPGetSequence 379 
...alCSPGetState 379 
aiCSPGetTempo 379 
aii&PGetVoI 379 
al0Spffiw;::379 
aipPPlav :? 379 
alCSPSend||fli 380 
alCSPSet^i^ 379 
alCSPSetCliFXMix 380 
alCSP§etChIPan 380 
alCSPSetChlPriority 380 
alCSPSetChlProgram 380 
alCSPSetChlVoI 380 
alCSPSetSequence 379 
alCSPSetTempo 379 
alCSPSetVol 379 
alCSPStop 379 
ALDMANew 382 
ALDMAproc 382, 383, 384 
ALEnvelope 430 
alHeapAUoc 447 
alHeapInit 372 
Alias 70, 71, 72 
aliased 271 
aliasing 271, 301 
alignment 48 
alignment, 16-bit 37, 58 
alignment, 16-byte 48 
alignment, 64 byte 36 
alignment, 64-bit 37, 58, 139, 320 



)5, 469l|#ik475 
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alignment, 64-byte 210 
alignment, color index palette 244 
alignment, image 320 
alignment, memory 58 
alignment, screen 272 
allnit 372, 382, 383, 386 
ALInstrument 428 
ALKeyMap 43 i 
alpha 287, 332, 336 
alpha combiner 29 1 
alpha compare 205, 278, 298, 356 
alpha dither 312, 336 
alpha times coverage 337 
ALPHA_CVG_SEL 337, 338 
ALSeq 376 
alSeqGetLoc 377 
aiSeqNew 376, 377, 378 
alSeqNewMarker 376, 377 
alSeqNextEvent 376, 377 
ALSeqpConfig 397 
alSeqpDelete 379 
alSeqpGetChiFXMix 380 
alSeqpGetChlPan 379 
alSeqpGetChlPriority 380 
alSeqpGetChlProgram 380 
alSeqpGetChlVol 380 
alSeqpGetSequence 379 
alSeqpGetState 379 
alSeqpGetTempo 379 
alSeqpGetVol 379 
alSeqpLoop 380 
alSeqpNew 378, 379 
alSeqpPlay 378, 379 
alSeqpSendMidi 380 
alSeqpSetBank 378, 379 
aiSeqpSetChlFXMix 380 
alSeqpSetChlPan 380 
alSeqpSetChlPriority 380 
alSeqpSetChlProgram 380 * 

alSeqpSetChlVol 380 
alSeqpSetSeq 378 
alSeqpSetSequence 379 
alSeqpSetTempo 379 
alSeqpSetVol 379 ■|& v 

alSeqpStop 378, 379 
alSeqSecToTicks 376, 377 ■ 
alSeqSetLoc 377 W 

alSeqTicksToSec 376, 377 
alSndpAllocate 3,73, 375 



alSndpDeallocate 374 , 37 5 ;IP^ i '''" tfl:? llfe,:. 

alSndpDelete 374, 375 |||' IPttfe 

alSndpGetSound 375 

alSndpGetStates 375 1||: 

alSndpNew 373, 375 ^WtSP' 

alSndpPlay 374, 375 *p 

alSndpPlayAt 375 «ra"" 

aiSndp SetFXMix 375 .mm::', 

alSndp SetPan 375 

al SndpSetPitch 375 |f 

alSndpSetPriority 375JI^ ^ 

aiSndp Set Sound 373, 1$Mg0$§' 

alSndpSetVol 375 II ,J# 

aiSndpStop 374, 375, 458 % 

ALSound 3f ^ 429 ^«tlP : 

alSynAddPl3$jjjSg4, 393, 394 

alSynAllocF|:ii§|l:; : : : ; ; : :! -, 

alSynAlloc#ce 38^f§93, 

alSynDelete 393 

alSynFreeFx 393 "' m ' 

..alSynFree Voice 393 

;.alSy%etFXRef 394 

' alSynjpt|0ip„rity 393 

alSyrflew' : '382,:393 
j.alSyli&move^l'er 393 
' alSynSetFXMi|j|86, 393 

alSynSetFXPapn 394 

aiSynSSt|pi93 

alSynSetPitch 393 

alSynSetPriority 385, 393 

alSynSetVol 393 

alSynStart Voice 385, 393 

alSynStartVoiceParams 393 

alSynStopVoice 385, 393 
* ALVoice 384 
"ALVoiceHandler 395 

ALWaveTable 373, 374 

ALWavetable 432 

ambient 156 

animation, sprite 273, 293 

antialiasing 46, 63, 74, 119, 175, 203, 204, 207, 301, 302, 327, 
340, 342, 343, 356, 496, 498, 501 

application thread 33 

artifacts, aliasing 271 

artifacts, antialiasing 328 

artifacts, filtering 274 

aspMainDataStart 474 

aspMainTextStart 474 

attack 374 
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attack-decay-sustain-release 406, 430 

audio 33, 372 

audio buffers 442 

audio command list 383 

audio DAC 41 

audio development tools 449 

audio DMA callback 383, 470 

audio heap 372, 382, 386, 442, 447 

audio interface 43, 46, 86, 102 

audio library 64, 65, 369 

audio playback 52 

audio playback rate 382 

audio processing 45 

audio system 449 

audio tools 401 

audio waveform 373 

Autodesk 3DStudio 71 



B 



back- face rejection 63, 154, 500 

back-facing polygon 329 

background image 297 

bank 447, 457, 462 

bank control file 447 

bank file 377, 426, 449, 451, 454 

bank object 403 

bank, MIDI 30 

bilinear filter 193 ,* 

billboard 205, 262, 286, 332, 333 

binary separating planes (BSP) 70 

bitmap 354 

BL 45, 176, 203, 204, 205 

blend 337 '" w 

blend color 205, 206 

blender 45, 203, 301, 305, 310, 317, 327, 331, 345 

blender equation 310 a s . 

blender mode bits, cycle-dependent JpB, 346 

blender mode bits, cycle-independjif 345 .,.^ : , !s ... 

b le nder mo de , cr ea ti o n 345 Ji W j-t > § m 

blending 63 %%... "' ; '''- : 5i 

blockmonkey 499 "'% 

blue screen photography 201 M 

Boot 87 ' #«!»>: ' ' 

boot location 120 %,•. «W::0ov^' 

boundingvolume 495 1;? . 

bounding volume sort 500- '" '-'&'; 

box filter 193 W '^Wi'h,, 

breakpoint 93, 486 

bss 123 



buffers, audio command list 442, 446 :; 

buffers, audio output 442 

buffers, audio sample DMA 442|; 444 

buffers, audio sequence 442 

buffers, sequence 447 

buffers, sequencer event 442, 446 

buffers, synthesizer update 442, 446 

bus bandwidth 48 ^$?:;^-;,. < . 

bvte ordering 4254© ;!> ''' :V - :; to; ; ': 

bzero 119, 123 §F xM&., 



C programming ianguage- 
C, middle C 43 1,439, 452- 

c_dev 30« ;i , -"' s Wm 

C3 452 ;■ 

C4 452 JJr^lfe 

c a c h e c one ren cy [ ISvK 

cache flushing 54 '^■f-^::. 

cache invalidate 48 
apache line 55, 118 
;? cache line teanng 48 

cache,:: data 118 

caf:pNw|iw ; ay set -as so dative 55 

iiilhe, verit ;: ;72. 149 

cache, wntgff ack 1 1 8 

cached adjp'ss 128 

c ached. iiuhfnapped 47 

CART'95 

CaseVision 30 

CAUSE register 93 

CC45, 176, 195, 200 

cell based scenes 500 

centroid sort 500 

chroma key 201 

CI 190,215,221,290 

clamp, coverage 333 

CLD.SURF 343, 344, 345 

clip ratio 152 

clipping 63, 152, 496, 498 

clock speed 48 

cloud 287, 336 

cloud surface 342 

cloud surface mode 344 

clouds 316 

CLR_ON_CVG 330, 337, 338 

codebook 436 

codecs 65 

coherency, span buffer 182 



47, 58jp7, 77, 137, 457 
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color combiner 45, 193, 195, 200, 278, 288, 291, 295 

color combiner input 196 

color combiner registers 197 

color combiner sources 195 

color index 188, 290 

color index texture 240 

color space conversion 194 

command buffer, RDP 109 

command list size, audio 446 

command list, audio 469 

command list, graphics 469 

comp. graphics 70 

comp.sys.sgi 70 

compare, Z 320 

compiler, C 77 

compiler_dev 30 

compressed audio 373 

compression 281 

Computer Midi Interface 421 

computer monitor 74 

concave 500 

controller input 66 

controller interface 86 

controllers, sequence player 381 

conversion tools 31 

convex 501 

convex objects 500 

coordinate system 146 4 

coprocessor 0, R4300 56 M 

Coprocessor Unusable 93 

copy mode 180, 277, 298 ';f 

copy pipeline mode 276 

COUNTER 95 

coverage 184, 304, 306, 314, 333, 335, 337, 340, 342 

coverage overflow 337 

coverage unit 306 A, 

coverage value 331, 332 

coverage, zap 338 

CPU 41, 45, 48, 52, 54, 84, 89, 9jfi||l3, ||7, 450/46 

CPU Fault 37 

CPU_BREAK 95 

cracks 306 *%'' 

culling 492 ^ 3i$&W 

culling, hierarchical 495 S; ; a . ''-mhx* 

culling.polygon 154 

culling, volume 154 

CVG_DST 337, 338 "& '^#fe,. 

CVG_DST_SAVE 317 

CVG_X_ALPHA,337, 338 



cyclic objects 500 



D 
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DAC 370, 372, 450, 469 

data cache. R4300 46, 47, 54, 118, 139^*^ M 

dbgif 3 1 , 67 , 480 , 48 1 , 482 , 484 Jr>Mft :r 

dbx 486 

debugger 67, 90, 93, 124> 479^80, 481, 482, 484 

debugging 37 

DEC_LINE 3 39 , 34 1§| 

decal 295, 337, 343 ''•:.'>, "1|| 

decal line mode 334, 3'4&M^ : fM 

decal surface 332, 333, 334 f||. : .0f 

decay 374 

degenerate PpygO ns 3 3 1 

deitaZ304, 321,323,328, 341 

depth compa^32dP|||. ; .-- 

detail texture '229, 23"0l§|3 r . 

detune value 459 

dev 30 

.development board 479 
iif e : vSl||ment system 48 

deviceH|rjyef;;401, 480 

Devici^ahipftjl 07 

m 9'f r 

diffuse 156 
disassembler,^-!'" 

displafl!li : ir^ll5, 116, 135, 137, 141, 218 
display list, audio 65 
display list, optimal 142 
display list, RDP 45 
dither filter 501 
dither, alpha 312 
dither, color 210 
*dither, noise 312 

dither, screen coordinate based 312 
dithering, color 211 
divot 334 
DM 107 
DMA 37, 44, 46, 48, 54, 55, 56, 58, 101, 112, 114, 139, 383, 

445, 470 
DMA, audio 445 
DMedia 5.5 421 
dmedia_eoe (version 5.5) 30 
DMEM 44, 115, 135 
DP 86, 109, 114 
DRAM 60, 63, 239, 475 
DRAM, 9-bit 119, 210 
dynamic memory allocation 58 
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effects 386 

envelope 373, 377, 402, 406, 457, 458, 461 

environment color 197 
environment mapping 168 
error, Z 325 
event 84 

example application 384 
exception 37, 85, 93 
exception handler 85 
executable 484 
explosions 316 



far plane 325 

fast clears 45 

FAULT 34, 95 

fault handler 34, 93 

file system 87 

fill color 211 

fill mode 180 

FttXCOLOR 352 

filter 271 

filter, average 276 

filter, bilinear 193, 272, 274 

filter, bilinear restrictions 193 

filter, box 193 

filter, point sampling 193 

filter, triangular 275 

filter, video 314 -<0'' fife 

fixed-point 144, 147, 185, 271 

flip, texture 279 ■& 

floating-point, R4300 46 

flt2c31,72 1 

fog 169, 179, 203, 205, 206, 313 

fog alpha 318 Jg? 

fog color 205 

FORCE_BL 317, 337, 338 jfr 

format, image 318 

fractal 234 '^| 

frame rate, audio 443 

FRAME_LAG 445 "' r 

framebuffer 41, 43, 45, 46, 48, 49, 119, 203, 205, 210, 29£ 

frame buffer alignment 'S JGL, 

f ramebuffer, color 5 8 ■ | Wh ;-; :; 

framebuffer, depth 58.J|f "*KS;;::., L 

frequency, texture 27 1 

FRUSTRATIO_l 152 

frustum clipping 63 



ftp 70 

G 
G_AC_ 

G_AC_ 
G_AC_ 
G_AD_ 
G_AD. 
G_AD_ 
G_AD. 

G_BL„ 

G_BL_ 

G_BL_ 

G_BL_ 

G_CC 

G_CC_ 

G_CC_ 

G_CC. 

G_CC 

G_CC. 

*G CC. 

%C 
G|€G 

G,_cc; 

Wcc 

G_CC 
G CC 

G_CG 
G_CC 
G_CC 
G_CC 
G_CC 
G_CC. 
G_CC 
G_CC 
G„CC 
G_CC 
G_CC 
G_CC 
G_.CC 
G_CC 
G_CC 
G_CC 
G_CC 
G_CC 
G_CC 
G_CC 
G_CC 
G CC 



DITHER 206 , 3 1 6 , 3 3 6 '^;g,,, . . .Jf ; 
NONE 206 "W:PM^ 

.THRESHOLD 206, 298, 315 
DISABLE 312 
NOISE 312. . 
NOTPATTERN 312 
PATTERNS 12 

.1 317 !!*.- JiF 

A_FOG317 
CLRJN 317 

CLR_MEM 317 %:■-■. 
ADDRGB 198 '"^m^ 
ADDRGBDECALA 198 
BiENil;I-99 
flENDWpg. 
BLENDIDECALA 199 
BLENDPEDEcSlA 289 
BLENDRGBA 199 
BLENDRGBDECALA 199 
*gHROMA_KEY2 202 
_DECALRGB 198 
DECALRGBA 198 
HIlirERGB 199 
HILITERGBA 199 
plLlTERGBDECALA 199 
.INTERFERENCE 200 
MODULATEI 199 
MODULATEIJPRIM 199, 288 
MODULATEI2 200 
_MODULATEIA 199 
_MODULATEIA_PRIM 199 
MODULATEIDECALA 199 
_MODULATEIDECALA_PRIM 199 
MODULATERGB 199 
MODULATERGB_PRIM 199 
_MODULATERGBA 199 
_MODULATERGBA_PRIM 199 
_MODULATERGBDECALA 199 
_MODULATERGBDECALA_PRIM 199 
_PASS2 200 
^PRIMITIVE 198 
_REFLECTRGB 199 
_REFLECTRGBDECALA 199 
_SHADE 198 
_SHADEDECALA 198 
TRILERP 200 
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G_CD„BAYER 312 

G_CD_DIS ABLE 312 

G_CD_MAGICSQ 312 

G_CD_NOISE312 

G_CK_KEY202 

G_CULL_BACK 154 

G_CULL_BOTH 154 

G_CULL_FRONT 154 

G_CV_K0 194 

G_CV_K1 194 

G_CV_K2 194 

G_CV_K3 194 

G_CV_K4 194 

G„CVJC5 194 

G_CYC_1 CYCLE 181, 206, 310, 314 

G_CYC_2CYCLE 181, 207, 263, 290, 310, 314, 344 

G_CYC_COPY 181, 205, 276, 277, 315, 316, 344 

GJ2YC_FILL 181, 205, 315, 344 

G_FOG 169, 207 

G_IM_FMT_CI 189 

G_IM_FMT_I 189, 288 

G_IM_FMT_IA 189 

G„1M_FMT_RGBA 189 

GJM_FMT_YUV 189 

G LM_SIZ_16b 189 

G_IM_SIZ_32b 189 

G_lM_SIZ_4b 189 

GJM_SIZ_8b 189 

G_LIGHTING 168 J 

G_MAXFBZ 211 

G_MTX_LOAD 145 

G_MTX_MODELVIEW 145, 157 

G_MTX_MUL 145 

G_MTX_NOPUSH 145 

G_MTX_PROJECnON 145, 157 

G_MTX_PUSH 145 fa, 

G_OFF $50 

G„ON 150 ™^ 

G_PM_ 1PRIMITIVE 1 8 3 , 499 41;" 

G_PM_NPR1MITIVE 183, 499 '%:%, "1 

G_RM_AA_TEX_EDGE 287, 289, 291 

G_RM_AA_ZB_OPA_SURF 204 "^ 4 

G_RM_AA_ZB_OPA_SURF2 204 

G_RM_CLD_SURF317 #.;„. Wt&y 

G_RM_FOG_PRIM_A 204pf%207 

G_RM_FOG_SHADE_A 204,' 205, 206, 314 

G_RM_NOOP 299, 315 ^ ' v;: %h\ 

G_RM_OPA_SURF 344 

G RM PASS 204, 205 ^ 



G_RM_TEX_EDGE 289 , 3 1 6 S^'"' " " ' ' 

G_RM„VISCVG 346 '!!#« 

G_RM_VISCVG2 346 

G_RM_ZB_CLD_SURF 317 

G_RM_ZB_OPA_S URF 299 

G_RM_ZB_OPA_SURF2 206 

G_TD_CLAMP 192 * l§Si# 

G_TD_DETAIL 192 

G_TD_S HARPEN 1 9$/ ' ' v i 

G_TEXTURE_GEN jp 

G_TEXTURE_GEN_UNEAR 168 

G_TF_AVER AGE 1 94, 276 

G_TF_BILERP 194, 273, 275., .,.. 

G_TF_CONV 1 94 ^ : #f : ' 

G_TFJFILT 194 "*%&&-" 

G_TF_FILTCONV 194 

GJTF_POIN§ 1941272, 273 

G_TL_LOL«l92 '*%■ 

G_TL_TILE 192, 290 '-">■■ 

G_TP„NONE 191, 269 " ir 
,%|EP_PERSP 191 

GJITJA16 192 

GJITlNGI^E 192 

G_TT_RGBA 16,192 
i:G_TiLCLAMl|89 

G_TX_LOADTILE 225, 248, 292 

G_TXJV1IRR# 189, 279 

G_TX_NGLOb 190, 279 

G_TX_NOMASK 189 

G_TX_NOMIRROR 189, 279 

G_TX_RENDERTILE 225, 248, 273, 275, 276, 292 

G_TX_WRAP 189, 283 

G_ZS_PRIM 299 

gain 377 

|game controller 29, 43, 46, 112 
' game timing 55 

GameShop 30, 67 

gamma correction 74 

GBI 61, 62, 188, 216, 218, 248, 351 

GBI assembly 62 

gbi.h 137, 139, 337, 501 

gdis 37 

gDPFuilSync 36 

gDPSetColorlmage 35 

gDPSetMasklmage 35 

gDPSetPrimColor 497 

gDPSetTexturelmage 35, 216 

gDPSetTextureLUT 244, 246 

gdSPDefLightsO 157 
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gEndDisplayList 353 

General MIDI 467 

generation of the MIP maps 232 

geometric level of detail 497 

geometry 61 

ginv28 

GIO 48, 49, 479, 480, 481 

GIO board 27 

gl_dev 30 

gload 31, 34, 37, 78, 87 

Gouraud 496 

GPACK_RGBA5551 211 

GPACK_ZDZ211 

graphics 33 

graphics binary interface 61, 62, 72, 137, 216 

graphics overrun 471 

graphics pipeline 45, 135 

gsDPFillRectangle 172 

gsDPFullSync 182 

gsDPLoadMultiBlock 292 

gsDPLoadMultiTile 291, 292 

gsDPLoadMultiTile_4b 291 

gsDPLoadSync 192, 216, 248 

gsDPLoadTextureBlock 163, 166, 216, 225, 262 

gsDPLoadTextureTile 189, 248, 282 

gsDPLoadTexrureTile_4b 189, 288 W 

gsDPLoadTile 216, 225, 248 

gsDPLoadTLUT 216, 225 

gsDPPipelineMode 183 

gsDPPipeSync 181,311 

gsDPSetAlphaCompare 206, 316, 337 

gsDPSetAlphaDither 312 

gsDPSetBlendColor311, 315 ^ '••'. 

gsDPSetColorDither 312 

gsDPSetCombineKey 202 

gsDPSetCombineMode 262, 288, 29| 

gsDPSetCycleType 169, 181, 206, 0, 276, 277, 310 

gsDPSetCyleType 290 M r 

gsDPSetDepthSource 299, 309 JW ^miBk-, 

gsDPSetEnvColor 289 

gsDPSetFogColor 169, 205, 207, 311, 313, 318, 3111 

gsDPSetKeyGB 202 

gsDPSetKeyR 202 

gsDPSetPrimColor 207#.288, 311 MMW 

gsDPSetPnmDepth 299|;|09, 311 

gsDPSetRenderMode 169^2Q4 i? 205, 206, 291, 314, 337, 344, 345, 

346 0' "'**&=,. 

gsDPSetScissor 185, 311 
gsDPSetTexrureConvert 217 



gsDPSetTextureDetail 192, 217 .M0^M-,^ 
gsDPSetTextureFilter 217, 272,273, 275, :: 2S§:;;. 
gsDPSetTexturelmage 248 »!^''- 

gsDPSetlexrureLOD 192, 217,|§g, , 
gs DPSeiTexrureLUT 2 1 6 B&^'Mp' 

gsDPSetTexturePersp 191, 216, 269, 270 
gsDPSetTile 216, 225, 248, 263 
gsDPSetTileSize 216,:225, 248, 263 
gsDPTexmreRectangie :; t69|: : ;273, 275, 276, 288 
gsDPTextureRectalgleFlip : i80; ; ,-,.,,, 
gsDPTileSync 1 9l| 2 1 6 
gsLoadTLUT 191 
gSPCullDispIayList°4# ! f| 
gSPDispIayList 35 !&,. Jm 

gSPEndDJsplayList 36 
gspFastlj|i||, 137, 156, 161, 496 
gspFast?.D„dramDataStart 474 
gspFast3;E>_draMt|:*tStart 474 
gspFastSDDataStart 474;-, 
gspFast3DTextStart 47$ : :: : : :: 
gsPipelineMode 499 
s|g|pLine3D 63 

gspLine3D_dramDataStart 474 
g s|t|ifig:| i E)^dramTextS tart 4 7 4 
gipXineJftgataStart 474 
gspLine3D|p;tStart 474 
gSPMatrix r ;35 
gSPSegntent 138 
gSPSetGeometryMode 206 
gspTurbo3D 63, 498 
gSP Vertex 35 
gSPViewport 35, 152 
gsSetAlphaDither 312 
gsSetConvert 194 
gsSetFiilColor 211 
gsSetPrimCoIor 198 
gsSetTextureConvert 194 
gsSetTextureFilter 194 
gsSetTextureLUT 192 
gsSPlTriangle 171 
gsSPBranchList 142 
gsSPClearGeometryMode 154 
gsSPClipRatio 153 
gsSPCullDisplayList 154 
gsSPDisplayList 141 
gsSPEndDispiaylist 142, 154 
gsSPFogPosition 169, 206, 207 
gsSPLine3D 171 
gsSPMatrix 145 
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gsSPPerspNorraalize 146, 308 

gsSPPopMatrix 145 

gsSPSetGeometryMode 154, 168, 169, 206, 207 

gsSPSetLightsO 159 

gsSPTexture 150, 216, 228 

gsSPTextureRectangle 172, 216 

gsSPTexrureRectangleFlip 172, 373 

gsSPVertex 149, 160 

gsSPViewport 308 

guLookAt 144, 152, 163 

guLookAtHilite 162 

guLookAtReflect 166 

guOrtho 144 

guParseGbiDL 35 

guParseRdpDL 35 

guPerspective 144, 146, 152 

gvd 31, 34, 67, 87, 124, 480, 484, 486 

H 

heap library 58 
hidden bits 318, 324 
high resolution 46 
hinv 28 

host overrun 470 
HW2 interrupt 96 



I 188, 215, 221, 240, 247, 288 
I/O 56, 86, 101, 103 

I/O, asynchronous 104 

I/O, synchronous 104 .4 

IA 188, 215, 221, 240, 247, 289 

ic 76, 402, 403, 413, 462 

idle thread 33, 90 

ie420 

IM_RD 317, 337 

image conversion 70 

image conversion software 74 

image format 318 lit .4ll ;:::: '' :>t ' 

IMEM 44, 115, 135, 138 

immediate mode rendering 61 

Indy video input 29 

Indy workstation 27, 28, 29, 30, 48, 49, 421 ji&§ 

Indy, and MIDI 421 *** 

initbsc 397, 398, 399 

instruction cache. R4300 46*;: ' ''''^rn'r^, 

instrument 376, 377, 398, 134, 427^429^457, 461 

instrument compiler 362, 402, 403, 4l2^4f5- 

Instrument Editqj&,420 



integration 33 

Intel 425 = jf 

interference pattern 296 

interference texture 261 

internal edge 326, 327, 328, 330, 332^ 

interpenetration 303, 337, 338, 342, 343 

interpenetration mode 335 

interpolation, bilinear 19.3j^2^:. n 

interpolation, video filt#l$26 "W:. 

interrupt 54, 85, 91, jj| 482 MgS:§m 

interruptmessages 54| 1 1 

inverse kinematics 71 ''-^WMMS ! 

IRIX 30, 67, 77 J 



K 

kernel 83 

kernel mode;47 : " : '&M 

key map 377)^405, 43 

Knuth 502 

KSEG0 34, 47, 114, 



461 



fiEf, 122, 126 



1 ay ere|pn|pii,5 00 
^58.1^^1^ 
f level W detail, gfeimetnc 70, 497 

level of detail texture 186, 232 

libaudig s 4^,386 1 

hbultra : '4l3i* v " 

libultra.a31, 77, 78 

libultra_d.a 77, 78 

light structure 156 

lighting 63, 156, 157, 261, 496, 498 

Line 331 

line mode 340 
goad block 253 
* load block, line limits 264 

load block, restrictions 254 

load tile 250 

LOD 186, 200, 228, 229, 235 

LOD, restrictions 259 

log 87 

loop 414, 436, 440, 455, 463 

loop point 440, 455 

low resolution 46 



M 



M^AUDTASK 474 

M_GFXTASK 474 
Mach band 211, 312 
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Macintosh 421 

makerom 77, 88, 115, 119, 123, 126 

matrix stack 144, 475, 498 

matrix stack operations 63 

memory allocation 58, 125 

memory interface 45, 210, 318 

memory management 85, 113 

memory map 58 

memory, block transfer 250 

memory, texture 239 . . 

meshed objects 500 

message 54, 56, 84, 85, 89, 91, 93 

message passing 54 

message queue 93, 104, 372, 472 

MI 45, 176,210,318 

microcode, audio 44, 369 

microcode, boot 137 

microcode, graphics 44, 61, 63 

microcode, RSP 43, 45, 47, 60, 137, 216, 469 

microcode, task 137 

MIDI 30, 64, 79, 369, 376, 378, 401, 402, 403, 407, 416, 423, 45% 

457 " 

Midi 421 

MIDI file 449, 463 
MIDI file format 425 
MIDI implementation 449 
MIDI key number 405 

MIDI message 463 ,?>?.. 

MIDI note 458, 460, 461 J|&, 

MIDI note number 402, 405 

MIDI note off 406 ^ 1|, 

MIDI note on 406 

MIDI port, Indy 421 ■-/-. 

MIDI sequence 450 ' '^l^ 

MIDI sequence bank 423 

MIDI sequence file 451 4^,. w 

MIDI velocities 405 

MIDI, compressed 376, 463 ,mm^ 

MIDI, compressed file format 431" J§0^-0i'h, 
MIDI, standard 376 W 

MIDI, type 376 

midicmp 75 '*!? J'f ; 

midicomp 416, 417, 463 ^.^gW' 

midicvt 75, 416, 463 ffe- ; ,. *ilfe ; " 

midiDmon419 

midiprint 416 >& : ''% : ^i->. 

MIP 232 *& ^lj|,;, : ... 

MIP maps, generation 232 
mipmappingj450, 179, 184, 223, 229, 232, 291, 333 



MIPS R4300 41 

mirror, texture 280, 281, 295 

mksprite 351 Jf| 

mode,copyl80 'IS 

mode, decal line 334 '''^m'::;M ;: ° ME' 

mode, fill 180 ^mMP' 

mode, interpenetration 335 *$&,£<••' 

mode.onecycle 1 77^ : f 
mode, particle system 3361^; j 
mode, poi nt sam|fl 338 
mode,textureed gf ;.; 333 
mode, two cycle li%MMM;$ 
modeling matrix 144 
modeling software 70 '^kz^MM' 
modular-color 288 -««^- : 

morphin|W%228, 292 
MULTIB IT. ALPHA 262 
MultiGef 31, 7l!*3;;| 
multiple tile effects '2€f<y- y 
Music Composition 75 ""'"' 
Ismutual exclusion 105 

Mehimen Ifilphics 71 

NinGen 7Qfg72 

Nintendo 64 development board 27, 28, 31 

NMlilff6 

noise 302, 312, 337 

non-maskable interrupt 96 

non-preemptive execution 54 

NOOP render mode 344, 345 

NTSC 46 

NURB 71 

Nyquist's Law 271 

O 

ocean waves 261 

octree 493 

one cycle mode 177 

OPA_DEC 343 

OPA„DECAL 339 

OPAJNTER 339 

OPA_SURF 339, 341, 343, 345 

OPA_TERR 339, 341 

opaque surface 327, 329, 330, 332, 333, 335, 337, 338, 341 

OpenGL62, 138 

operating system 33, 43, 47, 55, 83, 85, 89, 91, 93 

OS 480, 482, 484 



NU6-06-0030-001G of October 21, 1996 



515 



NINTENDO 64 PROGRAMMING MANUAL 



DRAFT 



OS_EVENT_PRENMI 96, 97 
OS_K0_TO_PHYSICAL 121 
OS_PRIOPJTY_RMON 483 

OS_TASK_DP_WAIT 474 
OS_YIELD_DATA_SIZE 476 

osAiGetLength 111 
osAiGetStatus 111 
osAiSetFrequency 111, 372 
osAiSetNexiBuffer 111, 372 
oscDelay 398 
oscDepth 398 
oscillator 397, 398, 399 
osContGerQuery 112 
osContGetReadData 112 
osContlnit 112 
osContReset 112 
osContStaitQuery 112 
osContStartReadData 112 
oscRate 398 
osCreatePi Manager 111 
osCreateRegion 125 
osCreateScheduler 472 
osCreateThread 59, 92 
osCreateViManager 109 
oscState 398 
oscType 398 
osDestroyThread 92 
osDpGetStatus 109 
osDpSetNexiBuffer 109 
osDpSetStatus 109 
osFree 126 
_„osGetCause 98 

osGetCompare 99 

_osGetConfig 99 
___osGetCurrFaultedThread 34, 100 
_osGetFpcCsr 99 
osGetlntMask 96 

__osGetNextFauItedThread 34, 100 
osGetRegionBufCount 126 
osGetRegionBufSize 126 $ 

_osGetSR .99 
osGetThreadld 93 
osGetThreadPri 93 
osGetTime 55 & , 

_osGetTLBASID 99 
_osGetTLBHi 99 : lPin= 

_osGetTLBLo0 99 
_osGetTLBLol 99 
__osGetTLBPageMask 99 



oslnitialize 88 
osinvalDCache 119, 123 

oslnvall Cache 123 

osMalloc 125 

osMapTLB 127 

osPiGetStatus 111 

osPiRawReadlo 111 

osPiRawStartDma 111 

osPi RawWntelo 1 1 1 

osPiReadlo 111 iff '% 

osPiStartDma 112 J 

osPiWritelo 111 ^iki^dii 

osScAddClient 472 " m ^''M 

osScGetTaskQ 476 

OSScTask 4|3. 

_osSetCause||S ;:; 

osSetComplpiife, 

„osSetCon^l9''^l||;;:;,,_ 

osSetEventMesg 96, 9fpj|p\ . 

_osSetFpcCsr 99 S '-f>> 

osSetlntMask 96 

_jjsS«|tSR 99 
Jbs'Set'iifeadPri 93 

osSetTLBASID,127 
,osSpT|Sffioai1||6 
l&sSpflskStart 1^|, 383 

osSpTask Yield J*@9, 471 

osSpTaskYielifl09 

osStarfPnM^l, 92,484 

osStopThread 93 

osSyncPrintf 33, 87 

OSTask 137, 383 

OSThread 90 

osUnmapTLB 127 
i osUnmapTLBALL 127 
josViGetCurrentField 110 
* osViGetCurrentFramebuffer 110 

osViGetCurrentLine 110 

osViGetCurrentMode 110 

osViGetNextFramebuffer 110 

osViGetStatus 109 

osVirtualToPhysical 121 

osViSetEvent 110 

osViSetMode 46, 110 

osViSetSpeeialFeatures 110 

osViSetXScale 110 

osViSetYScale 110 

osViSwapBuffer 110 

osYieldThread 92 
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output buffer size, audio 446 
overlay segments 123 
OVL SURF 343 



paint software 70, 74 

painter's algorithm 340 

PAL 46 

pan 373, 377, 381,402, 461 

pan values 452 

parallel interface 46 

particle system mode 336 

particle systems 71 

PASS render mode 344, 345 

patch format 426 

PBMPLUS 70 

PBUS 49 

PC 486 

PCL.SURF 339, 341, 343, 345 

percussion instrument 406 

performance profiling 55 

performance tuning 491 

performance, CPU 54 

peripheral interface 56, 86, 102 

peripherial device 43 

perspective correction 215, 277, 498 

perspective normalization 144 

physical address 44, 45, 47, 114, 115, 122, 139 

physical voice 384 

PI 48, 56, 86, 95, 102, 106, 111, 114 

PI manager 46, 56, 86, 90, 95, 1 1 1 

PIF 46, 102 

pinwheel 327, 338, 341 

pipeline mode, copy 205, 276 

pipeline mode, fill 205, 210 

pipeline mode, one cycle 205 

pipeline mode, two cycle 387, 200jp03, 228, 23 

pitch 402, 405 

pixel 46 sim giff^'^m 

pixel format, color 210 ""M;-:.-, 

pixel format, z 210 

playback rate 453, 459 '^f 

player 372 ,,, 1 

playseq 384, 388, 389% ^^^ 

point sample mode 338 W'M§^. 

point sample, restrictions^ 2 5 9% ";;; •-;:-, 

point sampling 193, 27?, 342 ^Bfe^.. 

polygon fragment 327 

polygon rasterization 61, 63 



244 



p ortai connecti vi ry 493 

position 402 .;<;¥ 

PRE_NMI_MSG 97 - J§ '' ; ||, 

pre ci s i on , z 3 8 . . ;; jJJ ;: '@ 

preemption 54 ''-'^SSp" 1 '' 

preemptive 84, 92 ^mg§M : 

PRENMI 95, 96 «j»^ 

PRIM_TILE235 .:MM:,, : 

primitive 269, 29J# > ""''' : S;> 

pri mi ti ve color 1 9 7, 288 

primitive tile nurttlr, 72Zfg0 :] ' 

PRIMITIVE_COLOR 352 

priority 381 

programcrash 38 If; } fg, 

projectipti; matrix 144 *®%&^&* 

punchthrough 329, 335 

Q ""%*. 

quadri cation 254 "" : ii"; : r c : 
quadtree 493 

R 

R4000 44, 46, 135 

R4ibO'42|iM7, 54, 55, 61, 77, 89, 93, 96, 113, 127, 137, 485 

R4300CPt||i6 

RAM 373 f ft 

rasterization setup 63 

rasterizer 45, 184 

RCP 41, 48, 49, 55, 60, 61, 65, 94, 102, 113, 135, 301, 351, 383, 

388, 426, 469, 497, 502 
rcp.h 110, 111 

RDP 43, 45, 52, 60, 86, 102, 150, 175, 178, 213, 269 
RDP attribute 182 
RDP pipeline 178 
RDP primitive 182 

RDRAM 48, 49, 58, 102, 105, 109, 318, 442 
Reality Coprocessor 41, 43, 113 
Reality Display Processor 43, 45, 102, 175, 213, 269 
Reality Signal Processor 43, 44, 102 
real-time scheduling 55 
rectangle 45, 184, 269 
rectangle, texture 269 
reduced aliasing 501 
reduction, polygon count 70 
reflection mapping 63, 165, 168, 496 
region allocation 125 
region allocation library 58 
region library 86 
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register, R4300 46 

release 374 

release notes 30 

render mode 303 

render mode, visualizing coverage 346 

render modes 339, 341, 343, 344, 345 

rendering mode 338 

rendering order 333, 334, 335, 340, 500 

rendering order, for antialiasing 204 

RESET 96 

retrace message 472 

reverb 381 

reverb amount 381 

RGB, SGI image format 70, 72 

rgb2c 72 

RGB A 188, 215, 221, 240, 247, 290 

RJ-11 29 

RM_ADD 317 

rmon 33, 34, 67, 95, 480, 481, 484 

rmon.h 484 

rmonMain 480 

rmonPrintf 67, 68 

rmonReadMem 481 

ROM 58, 77, 105, 373, 383, 402, 426, 450, 453, 479 

ROM cartridge 46, 48 

ROM image 77 

ROM packing 77 

RS 45, 176, 184 

RSP 34, 43, 44, 45, 47, 52, 60, 61, 102, 135, 206, 37} 

RSP data memory 44 ,/||^ 

RSP instruction memory 44 

RSP Scalar Unit 44 

RSP Vector Unit 44 W 

rspbootTextEnd 474 

rspbootTextStan 474 



s/w 184, 186 

sample converter 455 

sample rate 459 

sample rate, audio 443 

sampled sound playback 369, 373 

sampling 271 

sampling, point 271 ^ 

sampling, super 303 ''ffltfe:.-- 

sampling, unweighted area 3§3 ; ffff 

sbc423~438, 463 

sbk 75 

scaling, rectangle 27 1 



454 



scaling, sprites 294 

scheduler 65, 469, 472 Jjf 

scheduler thread 65 iff "Ifo 

scheduler, CPU 54 Ijg. t "% & 

scheduling,priority54 ''■'.;:;;'.;'■ 

scintillate 271 -•**»*» jg: 

scissor rectangle 185 

scissoring 184 !% s^^. 

scissoring, rectangle lj>fe %: - 

scissoring, restrictio nj|f % 5 

scrolling.of rectangles p75 

scrolling, texture Ift&'WM^^ff 

Sedgewick 502 '^m^'^g jjgf 

segmentaddress 34, 44, 121 |'1K7 

segment number 121 

segment off$J|||:21 

segment rablfpf^-:^ 

segmented 4J3resVi||:47, 115, 138, 174 

semaphore 85 ' :i ^Sify^ 

semitone 459, 460 

sequence back compiler 438 

sequence bank file 423 
s "seqli||fe bank format 438 

sequence buffer 442 

sequfjtee datlt§f 6, 450 
" sequence loop l^fnt 376 

sequence loopJIf&O 

sequenee ; pla$>ack 376 

sequence'piaver 75, 369, 370, 372, 376, 378, 394, 398, 401, 404, 
405, 425, 426, 450, 458, 461 

sequence, audio 447 

sequenced sound 376 

sequencer 431 

serial interface 46, 102 

serial port manager, Indy 421 
: SETOTHERMODE 174 

sgi.com 70 

SH284 

sharpened texture 229, 230, 235 

SI 48, 95, 102, 114 

silhouette 303, 314, 327, 328, 330, 332, 343, 344 

silhouette edge 204, 328, 333, 334, 337, 340 

simple 384 

simple, demo application 65 

size, texture 289 

SL284 

slide, texture 283 

smoke 316 

SNES 29, 74, 455 
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Softimage 7 1 

sort 330, 500 

sorting 298, 330, 502 

sorting algorithms 502 

sound 457 

sound bank 401 

sound duration 374 

sound effect 64, 450 

sound loop point 374 

sound pitch 374 

sound playback rate 453 

sound player 369, 370, 372, 373, 394, 401 , 407, 426, 450, 458, 461 

sounds, looped 374 

sounds, unlooped 374 

source file 487 

SP95, 109, 114, 122 

SP_BREAK 95 

SP_CUTOUT 356 

SP_DRAM_STACK_SIZE8 475 

SP_EXTERN 357 

SPJFASTCOPY 356 

SP_FRACPOS 357 

SP_HIDDEN 356 

SP_SCALE 356 

SPJTEXSHIFT 356 

SPJTEXSHUP 357 

SP_TRANSPARENT 356 

SP_UCODE_DATA_SIZE 474 

SP_UCODE_SIZE 474 

sp_z 356 Jr : v:: 

span buffer coherency 182, 499 y 'if '|l|, 

sparkles 336 

spClear Attribute 352 'W 

spColor 352 li 

spDraw 353, 356, 359 

specular 156 

specular highlight 361 

spFinish 351 

spgame 360 .00' 

splnit 351 M^'' !:: "'^%::, 

spMove 352 ' :> -|l 

sprite 45, 70, 262, 269, 273, 279, M^294, 297, 298, 349 

sprite library 349 

sprites,attribute 352, 355 If n£ ;i : ; 

sprites, bitmap structure' §i>:4,,.. 

sprites,coior352 

sprites, creating 351 

sprites, cutout 356 %;H>;:, ; , 

sprites, drawing 353 



sprites, examples 360 ./><%; =^S1 

spntes, in COPY mode 356 
s prites , mo ving 352 

sprites, re-use 359 y?, . 

sprites, scaling 352, 356 '^gffep; 

sprites, scissoring 353 

sprites, structure 354 

sprites, transparent 3| : 6^.< :: , : 

s pri tes , z-buffered J$2 :l ' : * v<;=% 

spScale 352 'll,?-elli»: . 

spScissor 352 JvP^"^'^:--, 

sp Set Attribute 3&Am0T 

spSetZ 352 '" te §ff 

sptask.h 137 $%,.,. 

stack overflow 55 '' : ''^&'0S^' 

stack, trire^.59 

stacktoo.^;S^§;?;;:„ 

stereo 4§fr ''yMy. 

stipple transparency ?336, 

stopOsc 397, 398, 399 

SU44 

*||JB_SURF 338, 339, 341 
'"'S|fe_TERR 340, 342 

su^pM|i;':306 

sjibpixel rriask 306 

Super Fam|^m 74 

Super Nintendo Entertainment System 29 

surface -types 203 

sustain 1 j 81 

SW1 95 

SW2 95 

sync command 45 

sync, pipe 45 

synchronization, of rendering pipeline 181 

synthesis driver 369, 370, 382, 394 

synthesizer 372 

T 

t/w 184, 186 

tabledesign 76, 412, 462 

task 65, 89, 109, 137, 469, 502 

task header 137 

task list 43, 60, 137 

tasks 42, 43 ■ 

terrain 335, 340, 496 

terrain mode 338, 341 

TEX_EDGE 317, 332, 339, 341, 345 

TEXJNTER 339 

TEXJTERR 339, 342 
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Chapter 1 Introduction and Installation 
Introduction to Z-Sort Microcode 

Z-Sort Microcode was developed to delete obscured screens at the Nintendo 64 (N64) hardware levei 
using a Z-sort. Z-Sort creates screens using a procedure which sorts all the graphics to be displayed on 
the screen in order of their depth on the screen and then draws them in order from back to front. 

The N64 OS/Library supports obscured screen processing using the Z-Buffer. This processing method 
judges whether or not a graphic is visible on a pixel- by- pixel basis. Compared with Z-Sort, this has the 
advantage being able to accurately express the relationship before and after the graphic is displayed. 
On the other hand, access to RAM increases. With Z-Sort, although the relationship before and after 
display cannot be processed to the same extent as with the Z-Buffer, the amount of RAM access per 
graphic decreases. Thus, the amount of graphics displayed on the screen within a specific time 
increases compared to the Z-Buffer method. 

The advantage of Z-Sort is that the improved RAM band makes the RDP processing load lighter. In 
many applications, the time required to perform RDP processing causes a bottleneck. Thus, lighter 
processing load is ideal when the volume of graphics is high. 

One note of caution, however. RSP processing load does not change significantly. RDP processing 
load changes according to the size of the area to be filled. With a drawing in a small area in particular, 
RDP processing ends sooner than RSP processing. Because there are many small drawing areas, RDP 
processing waits for RSP processing to end, during which time the processing capacity does not change 
with Z-Sort or with Z-Buffer. When the drawing area is somewhat larger, however, the Z-Sort method is 
effective. Z-Sort Microcode cannot do everything. Carefully consider the screen to be drawn before 
using Z-Sort. 

Installation 

This description pertains to installation of Z-Sort Microcode when it is distributed as a separate package. 
If it is already included in the N64 OS/Library, these operations are not necessary. 

Confirm Package Installation 

This microcode runs on N64 OS/Library version 2.0H or later. When using 2.0H, confirm that the 
following packages have been installed. If they are not installed, install them first. 

ultra N64 OS/Library Version 2. OH 

patchNmisc_082297 - Patch Nmisc_082297: 

miscellaneous patches for N64 OS/Library version 2.0H 

The Z-Sort package includes the following patch and, therefore, it need not be obtained separately. If 
the following patch is already installed, install the Z-Sort package as instructed above. 

patchNgbi_040997 - Patch Ngbi_040997: patch for gSP1 QuandrangleO in gbi.h for 

N64 OS/Library version 2.0H 
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For IRIX 5.3, 6.2, 6.3 

The Z-Sort Microcode package is formatted as follows. 

patch NuZST_mmddyy (mmddyy is the release date) 

Install this patch using the Software Manager or the inst command. This will install the following files. 
For details on the microcode, see the README file. 

/usr/src/PR/doc/gfxucode.Z-Sort/README 
/usr/lib/PR/gspZ-Sort.fifo.o 
/usr/lib/PR/gspZ-Sort .pi . f if o . o 
improved 



/usr/include/PR/gbi.h 
/usr/include/PR/gZ-Sort . h 
/us r/ include /PR/ rep .h 
/usr/src/PR/gZ-Sort/* 



README file 

Z-Sort Microcode 

Z-Sort Microcode (version with 

arithmetic operations) 
Z-Sort include file 
Z-Sort include file 
Z-Sort include file 
Z-Sort sample programs 



For Partner-N64PC (Windows95/NT) 

The Z-Sort Microcode package is formatted as follows. 

Z-SORTxxx.EXE (xxx is the release number) 

This file is self-extracting. When executed, the user will be asked for the installation destination. Input 
the ROOT directory of the N64 OS/Library. The default is c:\ultra. The file opens under the 
specified directory just as with the IRIX version. 
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Chapter 2 Z-Sort Microcode Functions 
Drawing Flow Using Z-Sort 

Z-Sort Microcode supports triangle areas, quadrangle areas, and texture and fill rectangles using RDP 
commands. In this manual, all areas to be drawn by the RDP are called zobjects. 

In Z-Sort Microcode, for each ZObject, one screen depth value is found to represent the drawing area. 
Each ZObject is then sorted by that screen depth and obscured screen processing is executed by 
drawing the ZObjects in order from the back to the front. 

The processing flow for ZObject drawing is as follows, 

1. Multiply model matn'x by perspective transformation matrix, etc. 

2. Calculate coordinate transformation/perspective transformation/screen depth for model 
vertices. 

3. Determine whether there are vertices in the screen. 

4. Determine clipping/back plane. 

5. Construct ZObject data. 

6. Create ZObject list, 

7. Draw in order of ZObject list (drawing processing), 

In order to draw a ZObject, the information concerning how the ZObject will be drawn must be prepared 
as data. With conventional Fast3D Microcode, the Vertex and Tri commands were combined to draw 
triangles, while with Z-Sort Microcode, drawing is performed by creating ZObject structures. 

Not all of these processes are available in Z-Sort Microcode. The major difference between Z-Sort and 
other graphics microcodes is that Z-Sort Microcode does not function by itself; the CPU must perform 
some of the processing related with drawing. 

For example, the function of sorting ZObjects in order of screen depth is not available as microcode. 
Since the CPU does not perform sorting, that function must be handed over to the RSP. 

At the very least, the CPU must perform the following processes. 

Clipping/back screen determination 

ZObject data construction 

ZObject list creation 

Z-Sort Microcode currently offers the following main functions. Each process is controlled by the Display 
ist (DL) comprised of one or more GBI commands. 

Multiplication of model matrix by perspective transformation matrix 

Calculation of coordinate transformation/perspective transformation/ screen depth for model 
vertices 

Creating flags for whether or not vertices are in the screen 

Drawing in order of ZObject list (drawing processing) 

Naturally, matrix multiplication and coordinate transformation (here, called arithmetic operation 
processing) could also be performed by the CPU. Dividing these tasks between the CPU and the RSP 
according to available processor capacity is best. For the remainder of the explanation, however, it is 
assumed that the RSP will perform arithmetic operation processing. If the CPU is to perform operation 
processing, read about the arithmetic operation processing explained in chapter 4. 
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Drawing and Arithmetic Operations 



When the RSP performs the arithmetic operations, Z-Sort Microcode processing uses two passes. Since 
the coordinate transformation of the before and after ZObject is not completed, the final Z-Sort results 
cannot be obtained. This means that data cannot flow in a pipeline like it does with other microcodes. It 
is necessary to temporarily hold all of the Zobject information. 

Thus, the following functions related to coordinate transformation are called arithmetic operation 
processing and are performed on the first pass. These processes are "vertex" coordinate 
transformations, so ZObject plane data is not created at this time. Note that the CPU creates actual 
ZObject plane data from the results of vertex coordinate transformation. 

• Multiplication of model matrix by perspective transformation matrix 

• Calculation of coordinate transformation/perspective transformation/ screen depth for model 
vertices 

• Determination of whether there are vertices in the screen 

The next process following Z-Sorting by the CPU, is called "drawing processing" and is performed on the 
second pass of the RSP. 

• Drawing in order of ZObject list 

This ZObject list is a chained data string similar to that below, in which ZObject data are linked in the 
form of a list in order from the back of the screen. The X of ZObj 3 below signifies the end of the chain. 



GBI ZObj 1 ZObj 2 



Data 



Data 



ZObj 3 



Data 



X 



It is necessary that the CPU create this ZObject list, Since Z-Sort Microcode supports ZObjects in list 
format, the cost of substituting in data when sorting can be kept to a minimum. Any sorting algorithm 
may be used. Incidentally, in the sample program of this microcode, packet sorting divided into 1024 
steps between far and near planes is performed by creating multiple ZObject lists. 

Once the above is complete, the processing flow continues as follows. 

[CPU] Create arithmetic operation Display List 

I 
[RSP] Arithmetic operation 

-I 
[CPU] Create ZObject data 

Create ZObject list (= Display List for drawing) (Z-Sort) 

I 
[RSP/RDP] Drawing Processing 

RSP Processing Implementation Methods 

It was discussed above that the RSP processing is divided into two passes. The methods for 
implementing this will be explained here. 

• Implementation method A) 2-task processing 

• implementation method B) 2-pass parallel processing 

A detailed explanation follows. Since A and B each has advantages and disadvantages, select the 
implementation method carefully. 
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2-Task Processing 

The 2-task processing method starts by dividing the tasks into arithmetic operation processing and 
drawing processing. This should be an easy method to understand since it resembles the starting 
methods for other microcodes. The principle must first be understood. 

Implementation Method A 

This is the simplest 2-task processing method. It is listed betow. 

1. Create the Display List for arithmetic operations. 

2. Start the first task of the RSP (Display List for arithmetic operations). 

3. The RSP performs calculations and the CPU waits until the RSP is done. 

4. a. Create ZObject data using the calculation results. 

b. Create ZObject processing links by sorting. 

c. Create the Display List for drawing processing, 

5. Start the second task of the RSP (Display List for drawing processing). 

6. The RSP performs drawing calculations and the CPU waits until the RDP is done. 

7. The RDP performs drawing. 

This is the simplest method and, therefore, the easiest to understand. It Is effective when shortening the 
time between key input and screen response. Also, since a single buffer is sufficient as the buffer for 
developing ZObject data, the amount of memory that should be reserved is decreased. 

True of ait implementation methods, constructing ZObjects using the CPU in (4) a~c above, requires a 
considerable computation cost. Differences in the number of ZObjects that can be drawn per frame 
appear in the ways in which this portion is implementated. If possible, it is recommended that you use 
"assembly language" instead of C language for this part of the implementation. 

The operating status of the CPU, RSP, and RDP for each process is shown below. The numbers in 
parentheses correspond to those above. 



Frame 
Start 



CPU 
RSP 

RDP 



Frame 
End 



-<1)=> 



==(2)=> 



:(3)=> 



==(4)=> 



==(5)=> 



==(6)=> 



Implementation Method B 

One of the problems with implementation method A is that there are no places where the CPU and RSP 
can operate in parallel. This leaves openings in both CPU and RSP processing. Pipelining processes 
(3) and (4), in method A, would eliminate some of the space. To create data for a certain number of 
ZObjects, creation must begin at vertex data points. To support this, Z-Sort Microcode contains a GBI 
command to send this message to the CPU. When this message is inserted midway through the 
arithmetic operation processing GBI command, the RSP sends the message to the CPU when the 
command is processed. When the CPU receives the message, it knows that arithmetic operations prior 
to the command that sent the message have been completed. 
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Also, the RDP does nothing during processes (1) to (5) of method A. Thus, the RDP's idle time means 
reduced drawing performance. Needless to say, to save RDP processing time, it is best that RDP 
drawing processes that do not require RSP operations, such as screen clearing, be performed within the 
first RSP pass. 

Merely as an example, if the above points are improved, the following results. 



Frame 
Start 



Frame 
End 



CPU 
RSP 
RDP 



==(1)=> 



==(2)=> 



> 

> 

-(7)=> 



==(5)=> 



==(6)=> 






In the third stage, each processor performs the following processing. 

CPU: Creates ZObject data from RSP coordinate calculation data and sorts it. Also, 

creates the DL for the second pass. 

RSP: Performs coordinate calculations and sends a message to the CPU every time a 

vertex data point necessary to create a certain amount of ZObject data is 
obtained. 

RDP: Primarily performs processing that does not require RSP operations, such as 

screen clearing 

implementating this processing system to perform the above is more complex than system A. There is 
no significant difference between the difficulty of this processing and that of 2-pass parallel processing 
described below. Since the performance gain resulting from serial processing (3) and (4) is generally not 
that great, a different method should be used when reducing the delay in response time after key input 
and reducing the memory footprint are not important. 

Implementation Method C 

If the delay in key input response time is acceptable, the following implementation method may be used. 
The processes (5) through (7) are carried over to the next frame. 



Frame 
Start 



Frame 
End 



CPU 
RSP 
RDP 



=={5)=> 



:(1)=> 



(2)=> 



==(3)=> 



==(4)=> 



> 



In this case, the time between key input and screen response slows, lengthening the RDP processing 
time. 
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Since processes (3) and (4) must wait until (6) has ended, the processing time of (6) in the RSP and the 
processing time of (4) in the CPU must be as short as possible. Since the time the RSP must wait for 
RDP drawing decreases when the FIFO buffer is enlarged, the processing time of (6) normally shortens, 
boosting the performance of this processing system, When numerous small ZObjects appear, however, 
the RSP processing time becomes longer than the RDP's, Since the RDP waits for the RSP, 
performance does not improve even when the FIFO buffer in enlarged. Thus, it would appear that (4) 
should be implemented using assembly language. 

Considering the ease of implementation and performance, this method appears to be the most balanced 
among the 2-task processing methods. 

Implementation Method D 

In the rare event that implementation and sufficient performance in (3) can be obtained using the CPU 
instead of the RSP, problem-free parallel processing would be possible, as shown below. However, 
since (6) and (4) sometimes overlap, ZObject data and the DL must be processed using a double buffer. 



Start 



End 



CPU 
RSP 

RDP 



==(5)=> 



:==(6)=== 



*»(2)=> 



:( 3 )=> | ==( 4 )=> 



:(7)= 



Whether or not this implementation improves performance depends on the extent to which (3) can be 
performed faster. If possible, use assembly language for this part of the implementation as was done 
with (4). 

2-Pass Parallel Processing 

In graphics processing, the RDP processing time rarely matches the RSP processing time. The FIFO 
buffer exists to absorb this difference. When the RDP processing time exceeds the RSP processing 
time, the End Processing RDP command is stored in the FIFO buffer. Since the FIFO buffer size is 
limited, if the wait is too long, the buffer becomes full. 

In other microcodes <Fast3D, F3DEX, S2DEX), when the buffer is full, the RSP waits until space opens 
up in the FIFO buffer. Merely waiting for RDP processing needlessly consumes the calculation capacity 
of the RSP. 

To eliminate this waste in Z-Sort Microcode, the RSP can perform other DL processing (mainly, 
arithmetic operation processing) while waiting for RDP processing. This combines arithmetic operation 
processing and drawing processing into a single task for a pseudo-parallel processing called 2-pass 
parallel processing. 

In 2-pass parallel processing, the DL processed within the RSP stand-by time is called the Sub Display 
List (Sub DL). Here, as in conventional microcodes, the normal DL is called the Main DL to distinguish it 
from the Sub DL. Just like the Main DL, the Sub DL has 18 dedicated DL stacks. Since the Sub DL is 
processed while the RSP is waiting for RDP processing, the GBI commands that can be processed by 
the Sub DL are limited. Naturally, commands using the RDP cannot be executed. Only commands 
using the RSP can be used. If GBI commands using the RDP are included in the Sub DL, a malfunction 
will result. Specific GBI commands which can be included in the Sub DL will be explained later. Mainly 
arithmetic operation commands can be used. 

In actual processing, the RDP processing time usually is not longer than the RSP processing time, and if 
the RDP drawing area is small, the wasted RSP time mentioned above disappears. When this happens, 
the Sub DL cannot be processed until expressly called by the Main DL. 
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The specifications for this microcode assume that there will be inconveniences. Since the RDP drawing 
area varies depending on the scene to be drawn, the RSP stand-by time in which the Sub DL can be 
processed is not constant. RSP arithmetic operation processing must end within a certain time to ensure 
the CPU's ZObject creation time. This is why Sub DL processing even outside the RSP stand-by time is 
so desirable. 

For the above reasons, a microcode gspz-sort. pi. f if o.o (z-Sort. pi ucode) has been prepared 
that starts each GBi command in the Sub DL, one at a time, each time a certain amount of ZObject 
processing is completed; even outside the RSP stand-by time. The timing for calling the Sub DL 
commands differs depending on the type of ZObject drawn. For polygon ZObjects, one Sub DL 
command is required for every two to four ZObjects. 

In contrast to z~Sort.pl ucode, the microcode gspz-sort. f if o.o (z-Sort ucode) is for Sub DL 
processing only during RSP stand-by. 

Since this additional processing is performed by z-Sort. pi ucode, the overhead becomes larger 
than in z-Sort ucode. Therefore, z-Sort ucode offers slightly better RDP drawing performance. 
These two types of microcode are identical except for the difference in calling the Sub DL and the larger 
overhead. Select the type desired according to the circumstances. 

The 2-pass parallel processing implementation is as follows. Here, (3) and (6) are processed in parallel. 
Installation E 



Frame 
Start 



Frame 
End 



CPU .... 
RSP Main 

Sub 
RDP .... 



==(5)=> 
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Chapter 3 Drawing 
Drawable Objects (ZObject) 

As explained earlier, in Z-Sort Microcode, graphics are drawn in drawing areas calfed ZObjects. The 
drawing parameters for each type of ZObject are defined below, according to the corresponding 
structure. 

zShTri triangle with smooth shading 

zshQuad quadrangle with smooth shading 

zTxTri triangle with textured smooth shading 

zTxQuad quadrangle with textured smooth shading 

zNull other drawing areas using RDP commands 

(used for Fill Rectangle and Texture Rectangle) 

Unfortunately, due to size limitations, Z-Sort Microcode does not provide ZObjects for drawing triangles 
and quadrangles with flat shading. To draw these, specify the same color for nil vertices. 

Although the microcode supports only these simple types of graphics, every imaginable type of graphic 
can be drawn using the libraries in the CPU. For details, refer to the sample programs. 

ZObject List Processing 

Since ZObjects can be put into a list format, pointer data for the next ZObject and the type ID for the 
next ZObject can be saved at the head of the structure. The 4 bytes at the head of all ZObject structures 
are reserved as the header area. ZObjects can be formatted as a list depending on the values of these 4 
bytes. 

GBI Command 

g[2]SPZObject 



i 



ZObj 1 



ZObj2 



ZObj3 



When the pointer and ZObject type ID in the head ZObject (ZObj 1 in the figure above) in the list are 
specified by the GBI command g[s] spzobject, the RSP draws in order according to this list. 

From to 2 lists can be processed by the GBI command g[s] spzobject. In other words, two ZObject 
lists A and B can be drawn by one GBI command. 

GB! Command 









g[2]SPZObject 






I 














ZObj 4 


o 


-> 


ZObj 5 


X 


ZObject list B 






















ZObjl 





> 


ZObj 2 





► 


ZObj3 


X 



ZObject list A 



The minimum size of a GBI command is 8 bytes, which is equal to two pointer data of 4 bytes each. If 
fewer than two processing lists are being drawn, write the end value (= g_zobj_none) in the empty 
space. 
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The data format of the GBI command g [s] spzobject is as follows, with the front and back halves 

being the same. 

* gSPZObject: (Gfx *gp, u32 listA, u32 listB) 

listA Link parameter of ZObject link A = ZHDR (ListA, ZidA) 
listB Link parameter of ZObject link B = ZHDR (ListB, ZidB) 

31 3 2 



ListA 


ZidA 


ListB 


ZidB 



ListA/ ListB The head 8 bits from bit 31 to bit 3 of the pointer to the ZObject list 

must be 0x80. (Normally 0x80) 
ZidA/ ZidB The ZObject type ID of head of the ZObject list 

zhdr (pointer, type) has been provided as a macro for setting these data (32 bits), and can be used as 

follows. 

gSPZObject (gfx, ZHDR (ptr_listA, ZH_SHTRI) , ZHDR (ptr_listB, ZHJTXTRI)); 

To change only processing link A or B, the direct value may be substituted in as shown below. 

*((u32 *) gfx) = ZHDR (ptrJListA, ZH_SHTRI); 

zh_xxxxx is the ZObject type ID and takes the following five values. 
zh shtri triangle with smooth shading 

zh shquad quadrangle with smooth shading 

zh txtri triangle with texture map and smooth shading 

zh txquad quadrangle with texture map and smooth shading 

zh null other drawing areas using RDP commands 

Although only gSPZObject has been explained here, gsspzobject also exists. Further GBI command 
explanations follow in later chapters; however, as with this GBI, gsSPz*** explanations will be omitted. 

Z-Sort Processing 

The GBI command g[s] spzobject is a structure listing only the pointer for the ZObject list and type ID 
of a ZObject. When this command is arrayed in multiple lists, however, three or more ZObject lists can 
be processed. For each ZObject list, a ZObject list of ZObjects with nearly the same screen depth is 
created. By listing them in order from the ZObject list at the back of the screen using g [s] spzobject, 
they can easily be packet sorted. 

The processing procedure is as follows. In this example, processing is performed dividing the screen 
depth for each ZObject into 1024 steps. 

Preparation of gspzobject Array 

Since there is one ZObject list per screen depth step, 512 commands (=1024/2) are required as the 
gSPZObject array size. Since this array becomes part of the DL and is processed directly, as is, by the 
RSP, gsPEndDisplayList is added to the very end of the gSPZObject array. As a result, the 
required size becomes 513 commands (=512 + 1). 

I 

| Gfx zarray [1024/2+1] 
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Array initialization 

Substitute the end value (g_zobj_none = 0x80000000) to all array elements to initialize the array. 

Write EndDL at the very end, as shown below. 

Gfx *zp = zarray + 512/ 

gSPEndDisplayList (zp); 
while (zp != zarray) { 

gSPZObject ( — zp, G_ZOBJ NONE, G ZOBJ NONE); 
} 



zarray 



X X 




510 



511 



512 



X X X X EndDL 



X: End value (= G ZOBJ NONE! 



Array Registration According to Screen Depth of each ZObject 

Calculate the screen depth for each ZObject. Although the RSP can calculate the value of the screen 
depth at each point, the decision as to which value to use as the screen depth for the ZObject is up to the 
user. Here are some examples of screen depth values. 

Examples of screen depths in triangle ZObjects 

Smallest value for distance from 3 vertices 

Largest vaiue for distance from 3 vertices 

Average value for distance from 3 vertices 

Median value between largest and smallest values for distance from 3 vertices 

.. ; tJte ^■■inverse values, . ...'.'..'.'.'.. ' ' '""?■■■■ --viv-^J^lV' ' ''' : ''''' ^'-'''- ;\ x - 

This vaiue is normalized between and 1023 and is the number of the array element to register. Store 
this number in the header of the ZObject in which the pointer and ZObject type ID that originaily existed 
in the applicable array element are registered, and write the pointer to the ZObject structure data and the 
ZObject type ID to the corresponding array element. 



332 



zid; 



/* No. of array element to register 



zHeader *zhptr: 
u32 ztype; 



/* ZObject pointer */ 

/* ZObject type ID */ 



for ( each ZObject ) { 

Calculate zid from the screen depth; 
if ( zid < ) zid = 0; 

if ( zid > 1023) zid = 1023; 
zhptr->t. header = * (uzarray+zid) ; 
* (uzarray+zid) - SHDR (zhptr, ztype); 



/* Clrjup zid */ 

/* Set next node */ 

/* Register to zarray */ 
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510 511 



512 



zarray 



X 


X 


X 


X 





X 


X 


X 


7/ 


f 
X 


X 


X 


X 


X 


EndDL 




• - 


/ / 




ZObjl 


X 





Drawing processing can be performed when this process is performed on all ZObjects and the completed 
arrays are called by gSPDisplayList. 

gSPDisplayList (o) 



510 511 



512 



zarray 




7 / 



EndDL 



ZObjl 



I 



ZObj3 


X 




ZObj4 













' 





ZObj2 











I 






ZObj6 


X 



ZObj5 


X 



In the above example, drawing is performed in the order ZObj 3 -» ZObj 1 -» ZObj 4 -> ZObj 5 -> ZObj 2 
-> ZObj 6. 



ZObject Data Formats 

Z-Sort Microcode supports five types of ZObjects and the data required to draw each differs. The five 
types of structures for storing each type of ZObject data are explained below. 

zshTri Structure 

The zshTri structure is used for drawing a triangle with smooth shading and no texture. The following 
three groups of zshvtx vertex data are necessary for specifying this shape. 



typedef struct { 

s 1 6 x , y ; 

u8 r, g, 

} zShVtx; 



b, 



/* Vertex screen coordinates (sl0.2) */ 

a; /* Each color in vertex 0. .255 */ 
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The zShTri structure has the following data format. 



vo 



V1 



V2 



+0 




+4 






+7 


Hdr 


RDP cmd 


X 


Y 


R 


G 


B 


A 


X 


Y 


R 


G 


B 


A 


X 


Y 


R 


G 


B 


A 



typedef struct 


{ 




zHeader 


* header ; 


Gfx 


*rdpcmdl; 


zShVtx 


v[3] 


; 


} zShTri_t; 






typedef struct 


{ 




zHeader 


*hea 


der; 


Gfx 


* rdpcmdl; 


u32 


xyO, 


clrO; 


u32 


xyl, 


clrl; 


u32 


xy2, 


clr2; 


} zShTri_w; 






typedef union 


{ 




zShTri t 


t; 




zShTri w 


w; 




u64 


fore 


e stru 


} zShTri; 







/* Information on next ZObject */ 

/* Pre-processing DP command */ 
/* Vertex data */ 



/* Structure for word access */ 



A triangle formed from the three vertices specified by this structure is drawn. At this point, the back side 
of the triangle is not taken into consideration. The triangle will be drawn regardless of the direction it 
faces. When a triangle facing the back is not desired, after the CPU determines the front and back when 
it creates ZObject data, draw only the ZObjects facing the front. The front/back determination is the 
same as for other polygon ZObjects. 

When the ZObjects are lined up by the list structure, the member variable header holds the pointer to the 
next ZObject. 

The member variable rdpcmdl is used to change the current RDP processing mode. Specify the RDP 
command DL string to be sent to the RDP before drawing the ZObject. For details on rdpcmdl, see, 
"Controlling RDP Commands with RDPcmd Parameters" on page 23. 

zshQuad Structure 

The zshQuad structure is used for drawing a quadrangle with smooth shading and no texture. The four 
groups of zShvtx vertex data necessary for specifying this shape are given beiow. 
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With zShQuad, a quadrangle is drawn by drawing the two triangles V0-V1-V2 and Vi~v2-V3. 
VOi 71 vi 




The zShQuad structure has the following data format. 



vo 



VI 



V2 



V3 



+0 




+4 






+7 


Hdr 


RDP cmd 


X 


Y 


R 


G 


B 


A 


X 


Y 


R 


G 


B 


A 


X 


Y 


R 


G 


B 


A 


X 


Y 


R 


G 


B 


A 



typedef struct 


{ 




zHeader 


*hea 


ier; 


Gfx 


*rdpcmdl; 


zShVtx 


v[4] 




} zShQuad_t; 






typedef struct 


{ 




zHeader 


*header; 


Gfx 


*rdpcmdl; 


u32 


xyO, 


clrO; 


u32 


xyl, 


clrl; 


u32 


xy2, 


clr2 / 


u32 


xy3, 


clr3; 


} zShQuad w; 






typedef union 


{ 




zShQuad t 


t; 




zShQuad w 


w; 




u64 


force stru 


} zShQuad; 







/* Information on next ZObject */ 
/* Pre-processing DP command */ 
/* Vertex data */ 



/* Structure for word access */ 



Memory requirements differ for drawing the same quadrangle using one zshQuad function or two 
zShTri functions. Using zShQuad requires less memory, a significant advantage. 

In addition, RDP drawing performance can be greatly improved by using the CPU to dramatically change 
the quadrangle's dividing line to better suit RDP drawing. Specifically, compare the absolute value of the 
Y coordinate of the V0-V3 diagonal (abs Y0-Y3) to the absolute value of the Y coordinate of the V1-V2 
diagonal (ABS Y1-Y2). Then, substitute in the ZObject data so that the quadrangle can be drawn as two 
triangles along the diagonal with the smaller absolute value. Refer to the foflowing algorithm. 
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zShQuad *zquad; 

if (A3S (Y0-Y3) > ABS (Y1-Y2)) { 

/■* Divide at diagonal VI -V2, divide into V0-V1-V2 and V1-V2-V3 */ 

zquad->t.v[0] = VO; zquad->t.v[l] = VI; 

zquad->t.v[2] = V2; zquad->t.v[3] = V3; 
} else { 

/* Divide at diagonal V0-V3, divide into V1-V0-V3 and V0-V3-V2 */ 

zquad->t.v[Oj = VI; zquad->t .v[l] = VO; 

zquad~>t.v[2] = V3; zguad->t.v[3] - V2; 
} 

However, since the diagonal to be selected as the dividing line is unknown at this time, the four specified 
vertices must be in the same plane so that whichever diagonal is selected, the division of the triangles is 
problem free. Also, when using texture or smooth shading, the texture coordinate value {s, t) or color 
value (r, b, g, a) must be set to avoid contradictions. (A poor example is included in the sample 
program cubes -1.) 

For a specific explanation of texture map use. When the vectors vi, V2, and V3 are defined as: 

VI = (vl - vO), V2 = (v2 - vO) , V3 = (v3 - vO) 

in vO (xO, yO, zO, sO, tOJ , vl (xl, yl, zl, si, tl), v2 (x2, y2, z2, s2, t2), 
and v3 (x3, y3, z3, s3, t3) , the actual factors a and b must exist to satisfy: 

V2 = a * VI + b * V3. 

Geometrically, the four vertices in the 5-dimensiohai coordinate space that included s and t must exist in 
the same plane. 

Similarly when smooth shading and lighting are used, the color value (r, g, b) or the norma! ray vector 
(nx, ny, nz) must satisfy the above relationship. 

For example, the vertices in the onetri demo, below, are not good for a quadrangle. 



onetri/stati 


c.c: 
















static Vtx 


shade 


vtx[] = { 














{ -64, 


647 


-5, 


o, 


o, 


o, 


o, 


Oxff, 


0, Oxff }, 


{ 64, 


64, 


-5, 


o, 


o, 


o, 


o, 


o, 


0, Oxff } 


{ 64, 


-64, 


-5, 


o, 


o, 


o, 


o, 


o, 


Oxff, Oxff }, 


{ -64, 
}? 


-64, 


-5, 


o, 


o, 


o, 


Oxff, 


....P,. 


0, Oxff }, 



This part 
There would be no problem, however, if this were expressed as illustrated below. 

static Vtx shade_vtx[] = { 

{ -64, 64, -5, 0, 0, 0, 0, Oxff, 0, Oxff }, 

{ 64, 64, -5, 0, 0, 0, 0, 0, O, Oxff }, 

{ 64, -64, -5, 0, 0, 0, 0, 0, Oxff, Oxff }, 

{ -64, -64, -5, 0, 0, 0, 0, Oxff, Oxff, Oxff }, 

}; 

In other words, pay close attention when the values between the vertices continuously change. No 
problem exists with Flat Shading in which the color values between the vertices do not change. 

zShQuad does not crimp the back of the quadrangle as was done with other ZObjects. Plan for this 
when the CPU creates ZObject data. 

When the ZObjects are lined up by the list structure, the member variable header holds the pointer to the 
next ZObject, If there is no next ZObject, substitute the end value G zobj none. 
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The member variable rdpcmdl is used to change the current RDP processing mode. Specify the RDP 
command DL string to be sent to the RDP before drawing the ZObject. For details on rdpcmdl, see, 
"Controlling RDP Commands with RDPcmd Parameters" on page 23. 

zTxTri Structure 

The zTxTri structure is for drawing textured triangles with smooth shading. The three groups of 
zTxVtx vertex data necessary for specifying this shape are given below. 



Vertex screen coordinates (sl0.2) */ 

Each color in vertex 0. .255 */ 

Texture coordinates in vertex (sl0.5) */ 

Texture pass vective correction parameter 1/W 

(sl5.16) (proportion to im-jrse of distance from 

perspective) */ 



The member variable invw is found as shown below from coordinate value (x, y, z , w) w after 
multiplying the coordinate value of each vertex (x, y, z, w) by the MP matrix. However, perspNorm 
is the parameter for normalizing the perspective transformation that can be obtained by the 

guPerspective function. 

invw = (1«30) / (perspNorm * W) : 

The RDP uses this value to correct the texture perspective. In the microcode's arithmetic operation 
processing GB1, this value can be found in the same way as perspective transformation. 

The zTxTri structure has the following data format. 



typedef struct 


{ 








sl6 


x, y; 






/* 


u8 


r, g, 


b, 


a; 


/* 


sl6 


s, t; 






/* 


s32 


invw ; 






/* 


} zTxVtx ; 











vo 

VI 

V2 



typedef struct 
zHeader 
Gfx 
Gfx 
Gfx 

zTxVtx 
} zTxTri 

typedef struct 

zHeader 

Gfx 

Gfx 

Gfx 

u32 

u32 

u32 
} zTxTri 

typedef union 

zTxTri_t 

zTxTri_w 

u64 
} zTxTri; 



+0 




+4 








+8 




+c +f 


Hdr 


RDP cmd 1 


RDP cmd 2 


RDP cmd3 


X 


Y 


R 


G 


B 


A 


S 


T 


invvV 


X 


Y 


R 


G 


B 


A 


S 


T 


invvV 


X 


Y 


R 


G 


B 


A 


S 


T 


invvV 



/* 
/* 
/* 
/* 
/* 



{ 

* header; 

* rdpcmdl; 

*rdpcmd2; 

*rdpcmd3; 

v[3]; 

t; 

{ 

* header ; 
* rdpcmdl ; 
*r dp cmd 2; 
*rdpcmd3; 
xyO, clrO, 
xyl, clrl, 
xy2, clr2, 
w; 

{ 

t; 
w; 

force_structure__alignment 



Information on next ZObject 
Pre-processing DP command 1 
Pre-processing DP command 2 
Pre-processing DP command 3 
Vertex data */ 



stO, 
stl, 

3t2, 



invwO; 
invwl ; 
invw2 ; 



*/ 
*/ 
*/ 
*/ 



/* Structure for word access */ 
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zTxTri does not crimp the back of the triangle as was done with other ZObjects. Plan for this when the 
CPU creates ZObject data. 

When the ZObjects are lined up by the list structure, the member variable header holds the pointer to the 
next ZObject. If there is no next ZObject, assign the end value g_zobj_none. 

The member variables rdpcmdl, 2, and 3 are used to change the current RDP processing mode to 
load the texture. Specify the three RDP command DL strings to be sent to the RDP before drawing the 
ZObject. For details on rdpcmdl, 2, and 3, see, "Controlling RDP Commands with RDPcmd 
Parameters" on page 23. 

zTxQuad Structure 

The zTxQuad structure is for drawing textured quadrangles with smooth shading. The four groups of 
zTxv-x vertex data necessary for specifying this shape are given below. 

With zTxQuad, a quadrangle is drawn by drawing the two triangles vo~vi~-v2 and V1-V2-V3. 
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The zTxQuad structure has the following data format. 



+0 



+4 



+8 



+c 



+f 



vo 

VI 

V2 
V3 



typedef struct 

zHeader 

Gfx 

Gfx 

Gfx 

ZTxVtX 
} zTrQuad_t; 

typedef struct 

zHeader 

Gfx 

Gfx 

Gfx 

u32 

u32 

u32 

u32 
} zTxQuad_w; 

typedef union 
zTxQuad_t 

zTxQuad_w 
u64 
} zTrQuad; 



Hdr 


RDP cmd 1 


RDP cmd 2 


RDP cmd3 


X 


Y 


R 


G 


B 


A 


S 


T 


invW 


X 


Y 


R 


G 


B 


A 


S 


T 


invW 


X 


Y 


R 


G 


B 


A 


S 


T 


invW 


X 


Y 


R 


G 


B 


A 


S 


T 


invW 



{ 

♦header; 

*rdpcmdl; 
*rdpcmd2; 
*rdpcmd3; 
v[4]; 



/* 

/* 
/* 

/* 
/* 



Information on next ZObject 
Pre-processing DP command 1 
Pre-processing DP command 2 
Pre-processing DP command 3 
Vertex data */ 



{ 

^header; 

* rdpcmdl; 

*rdpcmd2; 

*rdpcmd3; 

xyO, clrO, stO, invwO; 

xyl, clrl, stl, invwl; 

xy2, clr2, st2, invw2; 

xy3, clr3, st3, invw3; 



{ 

t; 
w; 
force structure alignment 



/* Structure for word access */ 



For the advantages of using zTxQuad and performance enhancing techniques, see the explanation of 
"zshQuad Structure" on page 17. 

zTxTri does not crimp the back of the triangle as was done with other ZObjects. Ran for this when the 
CPU creates ZObject data. 

When the ZObjects are lined up by the list structure, the member variable header holds the pointer to the 
next ZObject. If there is no next ZObject, assign the end value g_zobj_none. 

The member variables rdpcmdl, 2, and 3 are used to change the current RDP processing mode to 
load the texture. Specify the three RDP command DL strings to be sent to the RDP before drawing the 
ZObject. For details on rdpcmdl, 2, and 3, see, "Controlling RDP Commands with RDPcmd 
Parameters" on page 23. 
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zNuii Structure 

The zNuii structure is not for drawing so-called polygons like triangles and quadrangles. It is for 
drawing rectangle areas drawn by sending direct commands to the RDP (e.g., FiilRectangle, 
TextureRectangle). 

Not only the command for drawing the rectangle areas but the type of RDP command can be specified. 
As a result, a ZObject can be created merely by changing the Fog and Primitive colors and not actually 
drawing anything. 



+0 



+4 



+8 



+c 



+f 



Hdr 


RDP cmd 1 


RDP cmd 2 


RDP cmd3 



typedef struct 

zHeader 

Gfx 

Gfx 

Gfx 
} zNu!l_t 

typedef union. 

zNull_t 

u64 
} zNull; 



*header; 
*rdpcradl 
*rdpcrad2 
*rdpcmd3 



{ 

t; 



force structure alignment, 



When the ZObjects are lined up by the list structure, the member variable header holds the pointer to the 
next ZObject. If there is no next ZObject, assign the end value g_zobj_none. 

Specify the three RDP command DL strings to be sent to the RDP before drawing the ZObject. For 
details on rdpcmdi, 2, and 3, see, "Controlling RDP Commands with RDPcmd Parameters" on page 
23. 

Controlling RDP Commands with RDPcmd Parameters 

Each ZObject structure has one or three rdp cmd areas. The status of the RDP during ZObject drawing 
processing can be changed by the member variable. 

To change the RDP status, use the dedicated DL that lists the GBI commands. This is called the RDP 
command string. 

The RDP command string can contain primarily only commands for controlling the status of the RDP. In 
other words, the GBI commands that can be used as the RDP command string are limited. The RDP 
command string and the possible GBIs are shown below. The operation of the GBI commands below is 
the same as in the Fast3D-compatible microcode. GBI commands not listed below may not work 
correctly. 

GBI Commands Usable in RDP Command Strings 

gSPNoOp gDPNoOp 

gSPEndDisplayList 



gDPFiilRectangle 
gSPTextureRectangle 

gDPSetColorlmage 
gDPSetTexturelmage 

gDPSetFiilColor 

gDPSetFogColor 

gDPSetPrimColor 



gSPTextureRectangleFiip 

gDPSetDepthlmage 
gDPSetScissor 

gDPSetEnvColor 

gDPSetBlendColor 

gDPSetPrimDepth 
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gDPSetCombineMode 
gDPSetKeyR 

gDPSetOtherMode 

gDPPipelineMode(*) 

gSPSetTexturePersp(*) 

gDPSetTextureLOD(*) 

gDPSetTextureFilter(*) 

gDPSetCombineKey(*) 

gDPSetAlphaDither(*) 

gDPSetDepthSource(*) 

gDPSetTile 

gDPLoadBlock 

gDPLoadTextureBlcok 

gDPLoadTextureB!ock_4b 

gDPLoadTextureBlockYuv 

gDPLoadMuftiBlock 

gDPI_oadMuitiBlock_4b 

gDPLoadTile 

gDPLoadTextureTile 

gDPLoadTLUT_pai16 



gDPSetConvert 
gDPSetKeyGB 



gDPSetCycieType(*) 

gDPSetTextureDetailO 

gDPSetTextureLUT(*) 

gDPSetTextureConvert(*) 

gDPSetColorDitherO 

gDPSetAlphaCompare(*) 

gDPSetRenderMode(*) 

gDPSetTileSize 



gDPLoadTextureBlockS 
gDPLoadTextureBlock_4bs 
g DP LoadTextu re B lock Yu vs 
gDPLoadMultiBlockS 
gDPI_oadMultiBIock_4bS 



gDPLoadTextureTile_4b 
gDPLoadTLUT_pal256 



gDPLoadSync gDPPipeSync 

DPTileSync gDPFulISync 

One Important note here regarding the inability to use gSPSegment. Although the segment address can 
be used for gDpsetColorimage, and the like, the value cannot be set with the RDP command string. 
Also note that gSPBranchDL and gSPDisplayList cannot be used. 

it is assumed that the three RDP Cmd areas rdpcmdl, rdpcmd2, and rdpcmd3 will be used as 
follows. 

rdpcmdl : for setting RDP rendering mode 

rdpcmd2: for loading to TMEM (mainly, loading to total TMEM/front half of TMEM) 

rdpcmd3: for help in loading to TMEM (mainly, loading to TLUT/back half of TMEM) 

Given this assumption, use only rdpcmdl for drawing graphics without texture (zShTri, zshQuad). 
All three may be specified when drawing textured graphics (zTxTri, zTxQuad). 

Z-Sort Microcode is different from the microcode using the Z Buffer function, in that It draws In order 
from the back to the front. Thus, it cannot continuously draw only polygons with the same texture. 
Therefore, when using Z-Sort Microcode, ZObjects must be provided with texture information. However, 
Z-Sort Microcode is equipped with a mechanism for minimizing the waste that results when a texture that 
is already loaded to the TMEM is loaded again. 

The pointer to the just-processed RDP command string is memorized. This is compared to the pointer to 
the RDP command string to be processed by the current ZObject and is sent to the RDP only when it is 
different. 

The microcode contains RDP command pointer memory areas for the three RDP commands rdpcmdl, 
rdpcmd2, and rdpcmd3 in DMEM (tentatively called rdpcmdl_save, rdpcmd2_save, and 
rdpcmd3_save). The algorithm for each process Is written on the following page in C language. 
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For zshTri, zShQuad (one RDP Cmd area); 

if {rdpcmdl != rdpcmdl save) { 

Processing of RDP command string displayed by rdpcmdl ; 

rdpcmdl save = rdpcmdl; 
} 
Drawing of ZObject; 

The RDP command string for switching to the RenderMode is usually set to rdpcmdl. A sample of an 
RDP command string specific to rdpcmdl is given below. 

gsDPSetotherMode is the GB1 for setting a number of DP mode settings at once. Since many RDP 
commands can be processed with a single instruction, using this command accelerates the processing 
speed. The commands marked (*) in the above table of GBI Commands Usable in RDP Command 
Strings, can be processed collectively by gDPSetotherMode. 

#def ine OTHERMODE_A ( eye ) (G_CYC_##cyc## | G_PM_1 PRIMITIVE | G_TP_PERSP | ¥ 

G_TD_CLAMP | G_TL_TILE | G_TT_NONE | G_TF_BILERP | ¥ 

G_TC_FILT | G_CK_NONE | G_CD_DISABLE | G_AD_DISABLE) 
#define OTHERMODE_B ( rm) 
(G_AC_NONE | G_ZS_PRIM| G_RM_##rm## | G_RM_##rm##2 ) 

/* Shade Triangle mode switching */ 

Gfx modeShTrif] = { 
gsDPPipeSync() , 
gsDPSetotherMode (OTHERMODE_A (1CYCLE), OTHERMODE_B (RA_OPA_SURF) ) , 

gsDPSetCombineMode (G_CC_SHADE, G_CC_SHADE), 
gsSPEridDisplayList ( ) , 

}; 
For zTxTri and zTxQuad (three RDP Cmd areas) 

if (rdpcmdl != rdpcmdl_save) { 

rdpcmdl_save = rdpcmdl; 

Processing of RDP command string displayed with rdpcmdl; 
} 
if (rdpcmd2 != rdpcmd2_save) { 

rdpcmd2_save = rdpcmd2; 

Processing of RDP command string displayed with rdpcmd2; 
} 
if (rdpcmd3 != rdpcmd3_save) { 

rdpcmd3_save = rdpcmd3; 

if (rdpcmd3 != NULL) { 

Processing of RDP command string displayed by rdpcmd3; 

} 
} 

As with zshTri and zshQuad, the RDP command string for switching to the RenderMode is set to 
rdpcmdl. A sample of an RDP command string specific to rdpcmdl is given below. Palette-switching 
in the 4b CI texture, etc., can also be included here. 

/* Textured Triangle mode switching */ 

Gfx modeTxTriU = { 

gsDPPipeSync() , 

gsDPSetotherMode (OTHERMODE_A (1CYCLE) , OTHERMODE_B (RA_OPA_SURF) ) , 
gsDPSetCombineMode (G_CC_MODULATERGB, G_CC_MODULATERGB) , 
gsSPEndDisplayList () , 
}; 
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Set the RDP command for loading the texture to rdpcmd2. A sample of an RDP command string 
specific to rdpcmd2 is given below. 

I 

I Gfx modeTxTri2[] = { 

j gsDPPipeSync () , 

| gsDPLoadTextureBlock (brick, G_IM_FMT_RGBA, G_IM__SIZ_16b, 32, 32, 0, 

I G_TX_WRAP I G_TX_MIRROR, G__TX_WRAP I G__TX_MIRROR, 

| 5, 5, G_TX_NOLOD, G_TX_NOLOD) , 



I }; 



gsSPEndDisplayList 



Make settings to rdpcmd3 the same way as rdpcmd2. Although rdpcmd3 is presumably used for TLUT 
loading, it can also be used for texture loading. 

If rdpcmdS is unnecessary, assign null (= 0x00000000) . At this time, rdpcmd3_save is "cleared* 
by null and the RDP command displayed by rdpcmd3 is not processed. 

For zNull (three RDP command areas): 

if (rdpcmdl != NULL && rdpcmdl != rdpcmdl_save) { 
rdpcmdl_save = rdpcmdl; 
Processing of RDP command displayed by rdpcmdl; 

} 

if (rdpcmd2 ! = NULL && rdpcmd2 != rdpcmd2_save) { 
rdpcmd2__save = rdpcmd2; 
Processing of RDP command displayed by rdpcmd2; 

} 

if (rdpcmd3 != NULL && rdpcmd3 != rdpcmd3_save) { 

rdpcmd3_save = rdpcmd3; 

Processing of RDP command displayed by rdpcmd3; 

} 

There are no particular assumptions regarding zNull, so it may be used freely. As can be seen from 
the above algorithms, when null (=0x00000000) is set to rdpcmdl, RDP commands are not 
processed. The value of the corresponding rdpcmd_save at this time is *saved*. 

•and iTSsQuacL • ' 

Clear Screen and Other Drawing Processing 

One important note regarding the use of Z-Sort Microcode is the inability to write direct RDP commands 
to a normal Display List. This is due to its being internally divided into SP command processing and DP 
command processing. This determines the number of microcode instructions and processing speed. 

Normally, background filling processes, such as Clear Screen, are necessary for drawing all ZObjects. in 
Fast3D-compatib!e microcodes, such a GBI string is usually created in a static area and is called from 
the Display List side. 

However, since the RDP command string for controlling such DP operations as screen clearing is called 
from the normal Display List, Z-Sort Microcode contains the following GBt commands. The GBI 
commands that can be used for the RDP command string are limited, as are which ones can be used 
during ZObject drawing. Refer to the preceding table. For specific examples, refer to the sample 
program cubes-l. 

gspzRdpcmd (Gfx *gp. Gfx *rdpcmd) 

This is a pointer to the rdpcmd RDP command string. 

Process the RDP command string. The RDP commands that can be called, however, are limited. 
(Refer to the table "GB! Commands Usable in RDP Command Strings" on page 23.) 
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Chapter 4 Arithmetic Operations 
Display Objects and Arithmetic Operations 

As explained previously, Z-Sort Microcode, can draw four types of polygons, zshTri, zshQuad, 
zTxTri, and zTxQuad. Though this initially appears to be a small number, many more shapes can be 
drawn by combining these basic four. This microcode offers the following three principal processing 
operations. 

(Operation A) — gspzMuitMPMtx 

Model coordinate vertex data + 

MxP matrix ==> Screen coordinate vertex data 

(Operation B) gSPZLight / gSPZLightMaterial 

Normal ray vector data + 

Material data + 

Light data + 

Modelview matrix ==> Color data 

(Operation C) gSPZLight/gSPZLightMaterial 

Normal ray vector data + 

Line of sight (LookAt) data + 

Modelview matrix ==> Texture coordinate (environment map) data 

In all polygon ZObjects, (Operation A) must be performed to find the screen coordinate vertex data. 
Also, (Operation B) is required when processing light and (Operation C) is required when processing the 
environment map. 

Each GBi used to perform operations A, B, and C (gspzMuitMPMtx, 
gSPZLight/gSPZLightMaterial), however, is insufficient by itself. The vertex data and 
transformation parameters (matrixes, etc.) must be prepared and the DMEM in the RSP must be loaded 
before the GB! that performs the operations, in addition, the operation results must be written and 
returned to the DRAM from the DMEM. 

Work Area for Operations in DMEM 

Z-Sort Microcode has a GBI for specialized arithmetic operations to perform transformation processing to 
the 3D mode! screen coordinate system, lighting calculations, and matrix operations using the RSP. 

By combining multiple operations, such values as coordinate and color values necessary to draw 
ZObjects to the screen can be obtained. 

For example, the following GBI commands are combined to transform mode! coordinates to screen 
coordinates. 



1. 


gSPZViewPort 


Sets VIEWPORT. 


2. 


gSPZPerspNormalize 


Sets pass normalization factor. 


3. 


gSPZSetMtx 


Loads PROJECTION matrix to work area in DMEM. 


4. 


gSPZSetMtx 


Loads MODELVIEW matrix to work area in DMEM. 


5. 


gSPZMtxCat 


Multiplies PROJECTION and MODELVIEW matrixes. 


6. 


gSPZSetUMem 


Loads model coordinate values inside DRAM to work area in 
DMEM. 


7. 


gSPZMuftMPMtx 


Transforms model coordinate values to screen coordinate 
values. 


8. 


gSPZGetUMem 


Outputs screen coordinate vaiuesto DRAM. 
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fn Z-Sort Microcode, the work areas used in processing arithmetic operations are stored in DMEM. 
There are two types of work areas, one for general purpose use and one for matrices, each with the 
following sizes. Also, the general purpose work area is called the user area. 

General purpose work area: 

(User area) Total 2048 bytes 

Matrix work area Total 192 bytes 

(Breakdown) ModelView 64 bytes 

Projection 64 bytes 

M x P 64 bytes 

The user area occupies address to 2047. The application creator determines how this area is to be 
used. 

In libz-sort of the sample program cubes-l, the user area is used as follows. Though the areas 
overlap, this does not cause a problem because they differ in terms of time sequence. Refer to the user 
area. 

1200-1919: stores source of model coordinate values (Can hold up to 120 groups) 

0-1919: stores results of screen coordinate value (Can hold up to 120 groups) 

calculations 

0-383: stores source of normal ray vectors (Can hold up to 128 groups) 

512-1023: stores source of material colors (Can hold up to 128 groups) 

512-1023: stores results of lighting calculations (Can hold up to 128 groups) 

1024-1535: stores results of environment texture map (Can hold up to 128 groups) 

coordinate calculations 

1 920-2047: stores light data (3 DEFUSE lights + 1 AMBIENT 

+ environment map) 

The user can divide up and freely use the user area. Since the matrix area has been prepared for 
storing matrix data, however, it cannot normally be used for other purposes. The user area can also be 
divided up in detail by specifying a particular address; in the matrix area, basically one of the areas 
(gzm_mmtx, gzm_pmtx, or gzm_mpmtx) is specified. However, address 0-63 at the head of the 
user area and address 64-127 can be used for the matrix area. Therefore, the five foltowing areas can 
be used for matrices. Note that the matrix areas have been named Modelview/Projection/MxP for 
ease of understanding; their functions, however, are identical. If there is any confusion, the ModelView 
matrix can be assigned to the MxP matrix area. 

gzm_mmtx ModelView matrix area 

gzm_pmtx Projection matrix area 

gzm_mpmtx MxP matrix area 

GZMJJSERO User area address 0-63 

GZMJJSERl User area address 64-127 

GBIs used for arithmetic operations operate with either the Main DL or the Sub DL. Thus, pay attention 
when reading and writing the user area by either DL. When parallel processing by the Main and Sub 
DLs, Sub DL GBIs sometimes destroy the data calculated by Main DL GBI. Accessing the user area via 
either DL, therefore, is not recommended. Also, it is better to determine which DL will perform arithmetic 
operations. 
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GBI List 

This is the list of GBIs for arithmetic operations. 

gSPZSetUMem 

gSPZGetUMem 

gSPZSetMtx 

gSPZGetMtx 

gSPZMtxCat 

gSPZMtxTrnsp3x3 

gSPZViewPort 

gSPZMultMPMtx 

gSPZSetAmbient 

gSPZSetDefuse 

gSPZSetLookAt 

gSPZXfm Lights 

gSPZLight 

gSPZLightMateriai 

gSPZMixS16 

gSPZMixS8 

gSPZMixU8 



Writes data to user area 

Reads data in user area 

Writes matrix 

Reads matrix 

Multiplies matrixes 

Inverts 3x3 element of matrix 

Sets VIEWPORT 

Transforms mode! coordinate values to screen 
coordinate values 

Writes Ambient light (environment light) 

Writes Defuse light (diffused light) 

Writes LookAt structure data 

Performs light parameter pre-processing 

Performs light calculations 

Performs light calculations taking matrix into 
consideration 

Performs s16 numeric interpolation 

Performs s8 numeric interpolation 

Performs u8 numeric interpolation 



GBI Functions 

This sections explains the GBIs for arithmetic operations. 

gspzsetumem (Gfx *gp, u32 umem, u32 size, u64 *adrs) 



umem 
size 
adrs 



user area address for write destination (0-2040; 

write size (8-2048) 

pointer to write source in DRAM 



This GBI writes data to the user area, umem and size must be multiples of 8. Also, adrs has an 8-byte 
boundary, if 10 bytes of data are needed, specify 16 bytes. 

gspzcetuMem (Gfx *gp, u32 umem, u32 size, u64 *adrs) 



umem 
size 
adrs 



user area address for read destination (0-2040] 

read size (8-2048) 

pointer to read destination in DRAM 



This GBI reads data from the user area, umem and size must be multiples of 8. Also, adrs has an 8- 
byte boundary. 

gspzsetuMtx (Gfx *gp, u32 mid. Mtx *mptr) 



mid 
mptr 



matrix area for write destination 

pointer to write source in DRAM 



This GBI writes matrix data in DRAM to the matrix area. Generally, one of GZM_MMTX f gzm_pmtx, or 
gzm_mpmtx is specified to mid. However, the 128 bytes at the head of the user area can also be used. 
If so, specify gzm_usero and gzmjjseri. This allows address 0-63 at the head of the user area and 
address 64~127 to be used for the matrix area. 
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gspzGetuMtx (Gfx *gp, u32 mid. Mtx *mptr) 

mid matrix area for write destination 

mptr pointer to write source in DPJ\M 

This GB1 writes matrix data in DRAM to the matrix area. Generally, one of gzm_mmtx, gzm_pmtx, or 
gzm_mpmtx is specified to mid. However, the 128 bytes at the head of the user area can also be used. 
If so, specify gzm_usero and gzmjjseri. This aiiows address 0~63 at the head of the user area and 
address 64~127 to be used for the matrix area. 

gspzMtxcat (Gfx *gp, u32 mids, u32 midt, u32 midd) 

mids matrix area S 

midt matrix area T 

midd matrix area D 

This GBI calculates (Matrix D) = (Matrix s) + (Matrix T). Generally, one Of GZM_MMTX, 
gzm_pmtx, or gzm_mpmtx is specified to mids, midt, and midd. However, the 128 bytes at the 
head of the user area can also be used. If so, specify gzm_usero and gzmjjseri. This allows address 
0-63 at the head of the user area and address 64-127 to be used for the matrix area. 

When matrix T and matrix D areas are the same, however, the operation may not perform as expected. 
There is no problem when areas S and D or S and T are the same. 

g spzMtxTmsp3x3 (Gfx *gp, u32 mid) 

mid matrix area to be transposed 

This GBI transposes the 3x3 element of the matrix (x, y, z). When the matrix is rotating, the 
transposed result means the reverse rotation of the source matrix. This transposed matrix is used mainly 
for light processing. 
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One of gzm_mmtx, gzm_pmtx, or gzm_mpmtx is specified to mid. However, the 128 bytes at the head 
of the user area can also be used. If so, specify gzmjjsero and GZMJJSERI. This allows address 
0~63 at the head of the user area and address 64~127 to be used for the matrix area. 

gSPZViewPort (GfX *gp, Vp *Vp) 

vp pointer to VIEWPORT data 

This GBI is roughly the same as the gSPViewPort GBI in F3DEX. Although it sets the VIEWPORT, 
there are differences in the VIEWPORT data parameters. In Z-Sort Microcode, the parameter to control 
Fog is specified to the Vp structure member variables vscale, vscale[3] of vtrans and 
vtrans [3] using the following macro, 

vp- >vp . s cal e [ 3 ] = GZ_V I EWPORT_FOG_S ( in , out ) ; 
vp->vp. trans [3] = GZ_VIEW?ORT_FOG_T (in, out); 

where: 

in: Fog start distance 

out: Fog end distance 

A negative value must be set for the vscale [l] value to make the top part of the screen positive, i.e., 
the right, top, front direction (clockwise system). 



30 



Arithmetic Operations 



Start Fog from a distance of 3000 from the perspective. When specifying so that the background color is 
uniform at a distance of 4000, initialize as follows. 

Vp viewport = { 

SCREEN_WD*2, *SCREEN_HT*2 , G_MAXZ/2, GZ_VIEWPORT_FOG_S (3000, 4000) 

SCREEN_WD*2, *SCREEN_HT*2, G_MAXZ/2, GZ VIEWPORT FOG S (3000, 4000) 

} ; 

gspzMuitMPMtx (Gfx *gp, u32 mid, u32 src, u32 num, u'32 dest) 

raid MxP matrix 

src user area head address that stores vertex nodel coordinate values 

num number of vertices to be processed 

dest head address in user area that stores vertex screen coordinate 

values after coordinate transformation 

This GBI regards the data at the user area's src position as the 16-bit x, y, z value. This is multiplied by 
the 4x4 matrix specified by mid and that result (X, Y, Z, W) is normalized by W=1. The screen 
coordinate value is then obtained by transforming Viewport to the obtained coordinates. Also, at this 
time, the flags for the FOG parameter and clipping processing are calculated and that data is output to 
the dest position. Next, 6 is added to src and 16 to dest, and the process proceeds to the next vertex. 
The num vertices are processed continuously. 

The formats of the coordinate values to be input and output at this point are defined as follows as the 
zvtxSrc and zvtxDest structures in the header file gz-sort.h. 

typedef struct { 

sl6 x, y, z; /* Vertex model coordinate values (sl0.2) */ 

} zVtxSrc; /* size 6 bytes */ 

Vertex screen coordinate values (sl0.2) */ 

Texture pass vective correction parameter 1/W (sl5.16) 

X, Y values before normalization (integers only) */ 

Flag for clip processing determination */ 

FOG factor */ 

W value (integers only) */ 

Size total 16 bytes */ 

Since the size of the zvtxSrc structure is 6 bytes, pay special attention to the 8-byte alignment when 
transferring DMA using gSPZSetUMem. When the transfer size must be a multiple of 8, the DMA 
transfer size must be rounded off to a multiple of 8. 

Since the size of the zvtxDest structure is 16 bytes, only the 128-byte area in the 2048-byte user area 
can be protected. As a result, the num range is from 1 to 128. (In actuality, since light and other 
processes are performed, the range is usually smaller than this.) At this time, the num* 16-byte area 
from the dest address can be rewritten; the exception is when num is 3 or less. In this case, the 64-byte 
area from the position specified by dest is overwritten. 

For example, when num is 3 and dest is 0, the correct value after transformation can be stored at 
address 0-47 and meaningless data can be written to address 48-63. Be careful here because the 
value of the source of address 48-63 will be destroyed. This specification is necessary for improving the 
calculation speed. 



typedef struct 


{ 




sl6 


sx, sy; 




/ 


s32 


invw ; 




/ 


/* 








316 


xi, yi; 




/ 


u8 


cc; 




/ 


u8 


f og; 




/ 


sl6 


wi; 




/ 


} 


zvtxDest; 




/ 
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The routine for this GBI is illustrated below. Be sure that the unprocessed src is not overwritten by the 
dest output to allow the src and dest areas to be overlapped. In libz-Sort of the sample program 
cubes-i, with src = 1200-1919 and dest = 0-1919, a maximum of 120 vertices can be processed. 

for (i = 0; i < num.; i ++) { 

*dest = MultMP (*src) ; 
src += 6; 
dest += 16; 
} 

The member variable invw is found as shown below from coordinate value (X, Y, Z, W) W after 
multiplying the coordinate value of each vertex (x, y, z, 1) by the MxP matrix. However, perspNorm is 
the parameter for normalizing the perspective transformation set by guPerspNormalize. 

invw = (1«30) / (perspNorm * W) : 

The invw value can be used, as is, to set zTxTri/zTxQuad for the ZObject. 

When creating the MP matrix using guPerspective and guLookAt, the wi value usually indicates the 
distance from the perspective point. Z-Sort can also be performed by selecting this value as the screen 
depth. 

Also, xi, yi is the non-normalized coordinate value before perspective transformation. This value can 
be used mainly for clipping processing. Z-Sort Microcode does not support clipping processing using the 
microcode. However, clipping can be performed using the xi, yi, wi value in the CPU program. 
The details are explained later in this manual. 

By checking the value of the clipping processing determination flag cc, it can easily be determined 
whether that vertex is in Viewport (visible area). The following explains each cc flag. 

X coordinate is left of Left Plane of visible area 
X coordinate is right of Right Plane of visible area 

Y coordinate is above Top Plane of visible area 

Y coordinate is below Bottom Plane of visible area 
Z coordinate is closer than Near Plane of visible area 

_ _ Z coordinate is further from Far Plane of visible area 

To determine whether the triangle comprised of the vertices v0, v1, and v2 is completely outside the 
screen, do an AND for the cc value of each vertex as shown below and check to see if the result is 0. If 
the result is not 0, it means that the entire triangle area is outside at least one of the six clip planes, if this 
is the case, the processing can be stopped at that point, since the triangle is outside the screen. 

if (vO. cc & vl. cc & v2. cc) { 

Processing stopped because triangle is outside screen; 
} 

To determine whether the triangle vO, v1, v2 intersects the Near Plane, use the above formula to 
determine whether the triangle is outside the Near Plane and then perform OR processing. This can be 
used to determine whether clipping processing is being performed at the Near Plane. 

if ((vO. cc | vl. cc | v2. cc) & GZ_CC_NEAR) { 

Perform Near clipping processing; 
} 

fog is used when performing FOG processing. Using the fog value for A in RGBA enables FOG 
processing. In Z-Sort Microcode, FOG is adjusted by Viewport's Vp structure parameter. For details, 
refer to the sample program. 

In this GBI, obtainable vertex data is actually used as shown below. The numeric values actually to be 
assigned to each ZObject structure are the sx, sy, fog, and invw values. The invw or wi value 
can be used as the screen depth value for Z-Sort processing. 
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[ For zShTri structure ] 

zVtxDest *v0, *vl, *v2; 

zShTri *shtri; 

/* Screen coordinate setting */ 

shtri->t.v[0] .x = vO->sx; shtri~>t.v[0] .y = vO~>sy; 
shtri->t.v[lj .x = vi->sx; shtri->t.v[l] .y = vl->sy; 
shtri->t.v[2] .x * v2->sx; shtri->t.v[2] .y = v2->sy; 

/* The settings below apply only when using Fog */ 
shtri->t.v[0] .a = vO->fog; 
shtri->t.v[l] .a = vl->fog; 
shtri->t .v[2] .a ~ v2->fog; 

[ For zTxTri structure ] 

zVtxDest *v0, *vl, *v2; 

zTxTri *txtri; . 

/* Screen coordinate setting */ 

txtri->t.v[0] .x - vO->sx; txtri->t .v[0] .y = vO->sy; 

txtri->t.v[l] .x = vl->sx; txtri~>t.v[l] .y = vl->sy; 

txtri->t.v[2] .x = v2->sx; txtri->t.v[2] .y = v2->sy; 

/* Texture correction parameter setting */ 
txtri->t.v[0] .a = vO->fog; 
txtri->t .v[l] .a = vl->fog; 
txtri->t.v[2] .a = v2~>fog; 

gspzsetAmbient (Gfx *gp, u32 umem, Ambient *ambient); 
gspzsetDefuse (Gfx *gp, u32 umem, u32 lid, Light *defuse); 

u^exa. head address for light data protection area 

ambient pointer to Ambient light structure 

lid Defuse light number (0, 1, ) 

defuse pointer to Defuse light structure 

These GBIs write Ambient light (environment light) data or Defuse light (planar diffused light) data to the 
user area. The light data area is protected in advance in the user area. Its size depends on the number 
of Defuse lights and whether the environment is mapped. It is calculated as follows. 

(Light data area size) = 

8 + 24 * (number of Defuse lights) + ({environment mapping)? 48 : 0) ) / 

In libz-sort of the sample program cubes-i, since three Defuse lights and environment mapping are 
used, the 128 bytes from 1920 to 2047 are reserved for the lights. 

Fast3D macros can be used to set the Ambient and Defuse structures. When there are two Defuse 
lights, they are set using gdspDefLights2, as shown in the example below. 

/* Light parameter */ 

static Lights 2 scene_light = 

gdSPDefLights2 ( 0x20, 0x20, 0x20, /* Ambient */ 

OxeO, OxeO, OxeO, 0, 40, 80, /* Defuse */ 

0x40, 0x00, 0x00, 0, 80, 40 ); /* Defuse 1 */ 

/* Load light parameter */ 

gSPZSetAmbient (gp++, 1920, &scene_light. a) ; 

gSPZSetDefuse (gp++, 1920, 0, &scene_light . 1[0])/ 
gSPZSetDefuse (gp++, 1920, 1, &scene light. 1[1]); 
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gspzsetLookAt (Gfx *gp, u32 umem, u32 lnum, LookAt *lookat) 

umera head address for light data protection are'a 

lnum number of Defuse lights 

lockat pointer to LookAt structure 

This GBI writes the LookAt structure data that constitutes the parameter for environment mapping to the 
light data area. The light data area is protected in advance in the user area. Refer to the explanation for 
gSPZSetAmbient/gSPZSetDefuseon page 33 for further details. 

Z-Sort Microcode supports tex_gen_linear in the tex_gen and tex_gen_linear processing modes 
of the Fast3D-compatible microcode for environment map processing. It is already set up for tex_gen 
processing. 

Although the functions guLookAtReflect and guLookAtHilite in the gu library of 'the N64 OS can 
be used to set the LookAt structure, part of it differs from Z-Sort Since the macro guZFixLookAt is 
available for correction, correct using this after setting LookAt using the library functions. 

Shown below is the data write processing when gdSPDefLights2 is used for two Defuse lights. The 
lights are set and the reflection is mapped using guLookAtReflect. 

/* Light parameter */ 

static Lights 2 scene_light = 

gdSPDefLights2 ( 0x20, 0x20, 0x20, /* Ambient */ 

OxeO, OxeO, OxeO, 0, 40, 80, /* Defuse */ 

0x40, 0x00, 0x00, 0, *30, 40 ); /* Defuse 1 */ 

/* Make reflection parameter */ 

guLookAtReflect (&dynamicp->viewing, &dynamicp->lookat, 

0, 0, 1000, 0, 0, 900, 0, 1, 0); 

guZFixLookAt (& dynamic ; *>lookat) ; 

/* Load light parameters */ 

gSPZSetAmbient (gp++, 1920, &scene_light. a); 
gSPZSetDefuse (gp++, 1920, 0, &scene_light. 1[0]); 
gSPZSetDefuse (gp++, 1920, 1, &scene_light. l[lj); 

/* Load reflection parameters */ 

gSPZSetLookAt (gp++, 1920, 2, &dynamicp*>lookat) ; 

guZFixLookAt is defined in gZ-sort .h as shown below. 

#define guZFixLookAt (lp) 

{ (lp)->l[l] .l.col[l] = (lp)->l[l] .l.colc[l]l = 0x00; } 

This is because two elements have been cleared to 0. (0x80 has been assigned by the gu library.) if 
you want to optimize your processing time, refer to the source file of the gu library function in the N64 
OS under the /libultra/gu directory in the libultra sample program, to correct and replace the library 
function. 

gspzxfmLights (gfx *gp, u32 mid, u32 Inum, u32 umem) 

mid matrix with 3x3 element at upper left of ModeiView matrix inverted 

lnum number of lights to be processed 

umem head address in user area that holds light data 

This GBI performs lighting pre-processing. The GBI must be called after one or both of the light data 
and ModeiView matrix has been changed and by the time g*sPLight or g*sPLightMateriai is 
called, .This enables pre-processing in which light data can be used in light caiculations by gSPZLight 
and gSPZLightMaterial. Since light data will rarely change in one scene, this GBI is called when the 
ModeiView matrix changes. 
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To execute this GBI, the reverse rotation matrix of the ModeiView matrix is necessary. For this, the 
matrix with the 3x3 element at upper left of ModeiView matrix inverted can usually be used. (The 
shading is sometimes off when scaling only certain axes, but this is not a notable problem.) 
gSPZMtxTrnsp3x3 is used for the inversion. 

The number of Inum basically is the number of Defuse iights. This does not include the Ambient lights. 
Also, inum cannot be set to in Z-Sort Microcode. To process only Ambient lighting, specify one black 
(RGB=0, 0, 0) Defuse light using a dummy. 

When processing the environment map, use two so-called Defuse lights. When expressing highlighting 
and reflection, load the environment map parameter to the light parameter area and set (2 for the 
number of Defuse lights) to inum. 

When using only the environment map without using lights (no Defuse lights), the dummy Defuse light 
described above is unnecessary. Specify 2 to inum and call the GBI. 

Shown beiow is a processing example with the changed ModeiView matrix, the fight data area head at 
address 1920, two Defuse iights, and the environment map. 

/■* s e t ModeiView and MxP matrix */ 

gSPZSetMtx (gp++, GZM_MMTX, &dynamicp->modeling[i] ) ; 

gSPZMtxCat (gp++, GZM_MMTX, GZM_PMTX, GZM_MPMTX) ; 

/* Xfm light data */ 

gSPZMtxTrnsp3x3 (gp++, GZM_MMTX) ; 
gSPZXfmLight (gp++, GZM_MMTX, 1920, 4); 

gspzLight (Gfx*gp, u32 nsrc, u32 num, u32 cdest, u32 tdest) 
gspzLightMateriai (Gfx *gp, u32 msrc, u32 nsrc, u32 num, u32 cdest, u32 

tdest) 

msrc head address in user area that stores material color data (color of 

vertices) 
nsrc head address of user area that stores normal ray vector data 

num number of normal ray vector data to be processed (multiple of 2) 

cdest head address of user area that stores color value of vertices after 

light 

calculation 
tdest head address of user area that stores texture and coordinate values 

of 

vertices after environment map calculation 

This GBI regards the data from the nsrc address in the user area as the signed 8-bit normal ray vector 
value (nx, ny, nz). It calculates the lighting using the light parameters specified by gSPZXfmLight. 
This provides the light color that corresponds to the normal ray vectors. The vertex color is obtained by 
multiplying this light color and the material color, which is the color of the vertex itself, by each r, g, b 
element. These calculated color values are stored at the cdest address in the user area. 

With gSPZLight, (r, g, b, a) = (255, 255, 255, 255) is used as the material color. As with Fast3D 
microcode, this indicates vertex coloring using light color. Also, with gspzLightMateriai, use data 
from the msrc address in the user area as (r, g, b, a), in order, as the unsigned 8-bit color data. 

In addition, when LookAt structure data is set as the light data, lighting calculation and environment 
map calculation are performed simultaneously. Texture coordinate values (S, T - 0.00~ 32.00) are 
output to the tdest address in the user area as the calculation results. Even when LookAt structure 
data is not set, an undefined value is output to tdest so be careful that the (num * 4) bytes area is not 
destroyed. 

After cdest and tdest are output, 3 is added to nsrc and 4 is added to msrc, cdest, and tdest, 
and the process proceeds to the next vertex. Although the num normal ray vectors are processed 
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continuously, num must be an even number. If it is odd, (num+i) is output to cdest and tdest to make 
an even number. Meaningless data will be output to the output data position of num+i. 

The formats of the data values to be input and output using this GBI are defined as follows as the 
zNorm, zColor, and zTxtr structures in the header file gz-sort.h. 

typedef struct { 

s8 nx, ny, nz; 

} zNorm; 

typedef struct { 

u8 r, g, b, a; 

} zColor_t; 

typedef union { 

zColor_t n; 

u32 w; 

} zColor; 

typedef struct { 
s 1 6 5 , t ; 

} zTxtr_t; 

typedef union { 

zTxtr_t n; 

u32 w; 
} zTxtr; 

Since the size of each structure is 3 or 4 bytes, pay special attention to the 8-byte alignment when 
transferring DMA using gSPZSetUMem. When the transfer size must be a multiple of 8, the DMA transfer 
size must be rounded off to a multiple of 8. 

The routine of this GBI is basically as follows. If you are careful not to overwrite the unprocessed nsrc 
and msrc with the output of cdest and tdest, it is possible to overlap these areas. 

for (i = 0; i < num; i ++) { 

(+cdest, tdest) = CalcLight (*nsrc, *msrc) / 

nsrc += 3 

msrc +- 4 

cdest += 4 

tdest += 4 
} 

gspzM±xsi6 (Gfx*gp, u32 srd, u32 src2, u32 num, u16 factor) 
gspzMixss (Gfx*gp, u32 srd, u32 src2, u32 num, u16 factor) 
gspzMixus (Gfx *gp, u32 srd, u32 src2, u32 num, u16 factor) 

srci head address 1 in user area where data to be interpolated is stored 

and head address (common) in user area from which interpolation 
results are to be output 

src2 head address 2 where data to be interpolated is stored 

num number of data {multiple of 8) 

factor mixed factors (u. 15 format 0x0000~0x7fff value) 

These GBls perform linear interpolation on two numbers using the formula below. The s16, s8, and u8 
data types are handled by the respective GBI. 

(*srcl) = (*srcl)*factor + (*src2) * (1 . 0*f actor) ; 

In gSPZMixSl6, srcl and src2 combined are limited to 16 bytes. Also, in gSPZMixss and 
gspzMixus, srcl and src2 combined are limited to 8 bytes. 

num must be a multiple of 8. If a number which is not a multiple of 8 is specified, meaningless data will 
be output to srcl until the number becomes a multiple of 8. 
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Chapter 5 Other Processing 
GBI List 

This is a list of other GBis. 

gspzsegment Sets segment 

gSPZSetsubDL Registers/starts Sub DL 

gSPZLinkSubDL Processes unprocessed Sub DL 

gSpzsendMessage Sends message to CPU 

gspzwaitsignai Waits for signal from CPU 

GBI Functions 

This chapter explains the remaining GBIs. 

gspzsetsubDL (Gfx *gp, Gfx *subdi) 

subdl Sub DL head address 

This GBI registers the Sub DL and can only be processed in the Main DL. If a Sub DL has already 
started, a second Sub DL may not function properly, if entered. Register a Sub DL only after the 
processing of any Sub DL already registered by gSPZLinkSubDL is completed. 

gSPZLinkSubDL (GfX *gp) 

This GBI processes the Sub DL remaining to be processed and can oniy be processed in the Main DL. If 
a Sub DL has already ended, nothing happens when the Sub DL are not registered. 

gSPZSendMessage (GfX *gp) 

This GBI sends a SP_BREAK message to the CPU to inform the CPU of the status of Display List 

execution. 

When the DL execution status is unknown, the CPU cannot determine whether or not processing has 
been completed, forcing it to wait until RSP processing has ended (until the RSP message is received). 

Display List 

j ZObject A vertex calculation 

j ZObject B vertex calculation 

[ ZObject C vertex calculation 

I At this point, end message is sent to CPU 

If the Display List is prepared as shown below using this GBI, the CPU can know whether or not the 
vertex calculation for each ZObject has ended and can immediately build ZObjects. 

Display List 

ZObject A vertex calculation 
gSPZSendMessage -> message sent to CPU 
ZObject B vertex calculation 
gSPZSendMessage -»• message sent to CPU 
ZObject C vertex calculation 
i gSPZSendMessage -> message sent to CPU 
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Given the overhead resulting from actually sending and receiving messages for each Zobject, as 
explained above, it may be better to send messages for multiple ZObjects rather than for each object. 
This is up to the user. 

Just as with the delivery of normal messages, for the CPU to receive the sp_break message sent from 
the RSP, a message queue is used. Get the message queue for the sp_break message and connect it 
to os_EVENT_SP_BREAK using osSetEventMesg. Also, although it is safer to set the size of the queue 
to greater than the number of gSPZSendMessage in the Display List, this is not necessary. As long as 
the number of sp_break messages can be controlled, a smaller size presents no problem. 

In conventional microcode, rmonThread used this sp_break message. Originally, the message was 
prepared for microcode break point processing when using the GameShop DEBUGGER. This 
function currently is not used significantly, so it was left up to the user. As a result, when rmonThread is 
not used, no problem occurs. When it is used, note that the sp_break message queue must be set 
after creating or Starting rmonThread (execute osStartThreadtO rmonThread). 

gspzwaitsignai (Gfx *gp, zSignal *sig, u32 param) 

sig pointer to Signal buffer 

param Signal value (u32) 

This GBi waits until the CPU Signal value exceeds the param value. Since the Signal value from the 
CPU is updated through an RDRAM buffer, that buffer must be contained in the application itself. During 
execution of this GBI, the RSP determines whether or not the CPU has rewritten the buffer's Signal. If 
so, the Signal buffer on the RDRAM is DMA transferred to DMEM and compared to the param. 

The following is a macro for rewriting the CPU's Signal value. 

GZ_SENSIGNAL (zSignal *sig, u32 val) 

sig pointer to Signal buffer 

val new Signal value 

After the Signal value is rewritten to val, notice that the change that has occurred is sent to the RSP. 
Since the Signal value is an unsigned 32-bit variable, the smallest value is 0. 

So far in the microcode, the Display List is handed over to the RSP after it is complete. In other words, 
the RSP cannot process until the Display List has been completely created. However, even if the 
Display List is not completely created, this GBI can send any created portion to the RSP, i.e., the RSP 
can be made to wait until the rest of the Display List is created. When this gspzwaitsignai and the 
earlier output gSPZSendMessage are combined, simple synchronicity occurs between CPU and RSP 
processing, demonstrating the great power of serial processing of the Display List. 
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Chapter 6 Compatibility With Other Microcodes 
About GBIs 

Z-Sort Microcode is not compatible with other Fast3D-compatible microcodes. However, some GBIs will 
be shared to allow switching by the microcode and self-loading of the F3DEX system. This section 
explains those GBIs that will likely belong to both microcodes. 

The names of the GB!s explained here basically have the new prefix gSPZ instead of the corresponding 
prefix gSP of the GBI macro in F3DEX. 

Z-Sort Microcode GBIs include a subset of F3DEX GBI Level 2. This F3DEX GBI Level 2 is a new and 
improved GBI set offering faster RSP processing speeds in F3DEX Microcode and will be adopted in the 
upcoming F3DEX Microcode release. 

As a result, Level 2 is not compatible at the binary level with the GBIs adopted in F3DEX Microcode 
Version 1.23 or earlier. Thus, performing such processing as the microcode and self-loading in the 
F3DEX microcode system is difficult. 

Since 2-Sort Microcode uses F3DEX GBI Level 2, when using Z-Sort Microcode, f3DEX_gbi_2 must be 
defined by the #def ine statement or compile option D. 

Common GBIs 

gSPZSegment Sets segment 

gSPZPerspNormalize Sets perspective correction value 

gspzsegment (Gfx *gp, u32 seg, u32 base) 

seg segment number (0-15) 

base segment base address 

This GBI sets the segment. Although processing by either the Main DL or a Sub DL is possible, when the 
same segment number has been rewritten in the Main DL or a Sub DL, problems can be expected when 
parallel processing is started. To avoid these problems, try as much as possible not to overlap the 
segments to be used in the Main and Sub DL. 

gSPZPerspNormalize (GfX *gp, U16 persp) 

persp pass correction value 

This GBI sets the perspective correction value. It is the same as the gSPPerspNormalize GBI in 
F3DEX. 
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Chapter 7 CPU Support Library 



in Z-Sort Microcode, building plane data from the vertex data on the screen, i.e., ZObject data, depends 
on the CPU. Using arithmetic operation GBI commands, 3D coordinate vertices can be transformed into 
screen coordinate vertices. The CPU's roie is to connect these vertices to build polygons, The CPU 
performs other processing as well and, therefore, a CPU library must be created by the user to perform 
this processing. The library used in the sample program cubes-1 is explained below to provide a 
sample library. 

t Multiply model matrix by perspective transformation matrix 

I Calcula-e coordinate transformation/perspective transformation/screen depth 
for model vertices 

I Determine whether there are vertices in the screen 

I Determine clipping/back plane 

I Construct ZObject data 

I Create ZObject list 
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Chapter 8 Sample Programs 

The sample programs are installed under the /usr/src/PR/gZ~sort directory. 

zonetri/ 

This displays one quadrangle and is the simplest application of Z-Sort. 

cubes-1 

A wide variety of polygons can be drawn using Z-Sort Microcode. The general-purpose library libz- 
Sort is created and data is sent to it for drawing. Near clipping and other processes are performed in 
the library. Its 2-pass processing, however, hinders performance. 
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Chapter 1. Introduction 

What is S2DEX Microcode? 

The S2DEX microcode has been developed to use Super NES-like sprite and BG functions on the 
Nintendo 64 (N64). Due to these functions, it is easier to create a game using sprites. Also, by treating 
drawing objects as sprites and BG, N64 programming is similar to the conventional sprite game 
programming. 

Features of S2DEX 
The Drawing Primitive 

Since S2DEX is designed specifically for processing 2D expressions, 3D primitive drawing for Fast3D 
and F3DEX is not supported. However, the following primitives can be drawn using S2DEX Microcode. 

Rectangle A -- gSPObjRectangle,gSPObjRectangleR (Copy Mode) 

Size is fixed. Texture flipping (vertical / horizontal) and drawing in the copy mode is possible. Scaie 
change (magnifying /shrinking) and rotation are not possible Texture interpolation display and subpixel 
movement are not possible. Anti-aliasing processing is not possible. The texture must be loaded to 
TMEM before drawing. 

Rectangle B - gSPObjRectangle, gSPObjRectangleR(1, 2 Cycle Mode) 

Texture flipping is possible (vertical / horizontal). Drawing in 1, 2 cycle mode is possible. Texture 
interpolation display and subpixel movement are possible. AntiAlias processing is possible. Scale 
change (magnifying / shrinking) is possible, but rotation can not be done. Texture must always be loaded 
to TMEM. 

Sprite -- gSPObjSprite 

Scale change (magnifying / shrinking) and rotation are possible. Texture flipping is possible (vertical / 
horizontal). Texture interpolation display and subpixel movement are possible. AntiAlias processing is 
possible. Drawing in copy mode is not possible. Texture must always be loaded to TMEM. 

BackGrOUnd (BG) A -- gSPBgRectCopy 

Scrolling in closed region (vertical / horizontal loop) is possible. Horizontal texture flipping is possible 
(not vertical texture flipping). Drawing is possible in copy mode only. Scale change (magnifying / 
shrinking) is not possible. Texture interpolation display and subpixel movement are not possible. 
AntiAlias processing is not possible. Drawing is done by loading the texture on DRAM to TMEM as 
necessary. 

BackGrOUnd (BG) B -- gSPBgRectlCyc 

A CPU-based emulation routine is available. Scaie change (magnifying / shrinking) is possible. 
Scrolling in closed region (vertical / horizontal loop) is possible. Horizontal texture flipping is possible 
(not vertical texture flipping). Drawing can be performed in 1 cycle mode only Texture interpolation 
display is possible. Subpixel movement is possible in the horizontal direction only. AntiAfias processing 
is not possible. Drawing is done by loading the texture on DRAM to TMEM as necessary. 
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From Old GBI... 

The following functions can be used from the old graphics binary interface (GBI). 

• FiilRectangle 

• TextureRectangle 

• TextureRectangleFIip 

The following functions can not be used from the old GBI. 

• ITriangle 

• 2Triangle 

• 1 Quadrangle 

There are not many similarities between S2DEX and old Sprite2D Microcode. S2DEX is not an upgrade 
to Sprite2D, but it is rather a new microcode. Also, sprite libraries such as spinit ( ) can not be used in 
combination with S2DEX because sprite libraries use 3D microcode. The S2DEX library is completely 
different from the sprite library. 



Self Loading Function 



As mentioned above, S2DEX is not capable of drawing 3D primitives. However, S2DEX has a 
microcode self loading function which is supported by F3DEX (Release 1.20 or later). Therefore, it is 
possible for S2DEX to draw 3D primitives by loading F3DEX microcode. 

DEBUG Information Output Function 

There are two types of S2DEX Microcode. One is installed for master ROM, and the other is for 
debugging. The Microcode for debugging is equipped with the following features. 

• Output display list processing log 

• If illegal input value or illegal command detected, stops RSP, and send the report to CPU 
These functions are fully described later in this manual. 

Passing Commands from RSP to RDP 

S2DEX only supports fifo versions (same as F3DEX series). 

However, a larger FIFO buffer is required by S2DEX than for F3DEX. While this buffer had to be 0x300 
bytes or larger for the F3DEX series, it has to be at least 0x800 bytes for S2DEX. Please be aware that 
If you want the FIFO buffer to be shared by the F3DEX series and S2DEX, it must be at least 0x800 
bytes to fulfill the S2DEX requirements. 

Tfte.R^G size required varies depemJtftg 6h : the microcode cpm|ftaridV TTtese areks 
noted above f 'or tti& F3DEX serfes and S2D£X,. while 0x1 88 bytes are: ^cessary: for 
.:"'"..".". ."Fast30"- ""." 
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Chapter 2 Compatibility with F3DEX 

The GBi of S2DEX was derived from F3DEX. So, there is no compatibility with the GBI of Fast3D. 
When you use S2DEX, you need to define F3DEX_gbi like F3DEX before ultra64.h is included. 

Also, to use the GB! of S2DEX, you need to include the header file <PR/gs2dex . h>. insert this include 
specification after the Include specification of <ultra64 .h>. 

Next, let's compare the GBI of S2DEX and the GBI of F3DEX. Simply put, you can consider that S2DEX 
does not support GBIs which deal with 3D primitives, 4x4 matrices, and light definition. 

The following refers to gsp* and gDP* only, but the same applies to gssp* and gsDP*. 

GBIs Supported by Both S2DEX and F3DEX 

The following GBIs are fully supported by both S2DEX and F3DEX, except as noted. 



DL Process Control 

Setting Up Segment 
Loading Microcode 
Scissoring 
Setting RDP Mode 



Setting Color Value, etc. 



Loading to TMEM 



Primitives 



Sync Processing 



NOOP 



gSPDisplayList (*) 
gSPEndDisplayList 

gSP Segment (*) 

gSPLoadUcode* 

gDPSetScissor 

gS PSetOtherMode 

gDPSetTexturePersp 

gDPSetTextureLOD 

gDPSetTextureFilter 
gDPSetCombineKey 
gDPSetAlphaDither 
gDPSetAlphaCompare 

gDPSetRenderMode 
gDPSetDepthlmage 

gDPSetCombineMode 

gDPSetEnvColor 
gDPSetFogColor 
gDPSetPrimColor 
gDP Set Convert 
gDPSetKeyGB 

gDPSetTileSize 

gDPSetTile 

gDPLoadMultiBlock* 

gDPLoadMultiTile* 

gDPLoadTLUT_pal256 

gDPFillRectangle 
gSPTextureRect angle 
gsSPTextureRectangleFlip 

gDPFullSync 
gDPPipeSync 

gSPNoOp 
gDPNoOpTag 



gSPBranchList 



gDPSetScissorFrac 

gDPSetCycieType 

gDPSetTextureDetail 

gDPSetTextureLUT 

gDPSetTextureConvert 

gDPSetColorDither 

gDPSetBlendMask 

gDPSetDepthSource 

gDPSetColorlmage 

gDPSetTexturelraage 

gDPSetBlendColor 
gDPSetFillColor 
gDPSetPriraDepth 
gDPSetKeyR 

gDPLoadTile 
gDPLoadTextureBlock* 
gDPLoadTextureTile* 
gDPLoadTLUT_pal 1 6 

gDPScisFillRectangle 

gSPScisTextureRect angle 

gDPTileSync 

gDPLoadSync 

gDPNoOp 
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GBIs Not Supported in S2DEX 

The following GBIs are not supported by S2DEX. 
Setting View 

Matrix Operation 



Vertex Operation 
Conditional Branch 
Polygon Type Setting 

Primitives 

Lighting 



Fog 

For Old Sprite2D Use 

New GBIs 

The following GBIs have 
BG Drawing 
Sprite Drawing 

2D Matrix Operation 
Drawing Mode Setting 
Load Texture Processing 
Compound Commands 

Conditional Branch 



gSPViewport 
gSPPerspNormalize 

gSPMatrix 
gSPInsertMatrix 

gSPVertex 

gSPCullDisplayList 

gSPSetGeometryMode 
gSPTexture 

gSPlTriangle 

gSPlQuadr angle 
gSPLineW3D 

gSPNumLights 
gSPLightColor 
gSPLookAt* 
gDPSetHilite2Tile 

gSPFogFactor 

gSPSprite2DBase 

gSPSprite2DDraw 



been added to S2DEX. 

gSPBgRectCopy 

gSPObj Rectangle 
gSPObj Sprite 

gSPObjMatrix 

gSPObj RenderMode 

gSPObj LoadTxtr 

gSPObj LoadTxRect 
gSPObj LoadTxSprite 

gSPSelectDL 



gSPClipRatio 

gSPPopMatrix 

gSPForceMatrix 

gSPModifyVertex 

gSPBranchLessZ* 

gSPClearGeometryMode 
gSPTextureL 

gSP2Triangles 
gSPLine3D 

gSPLight 

gSPSetLights[0-7] 
gDPSetHilitelTile 

gSPFogPosition 
gSPSprite2DScaleFlip 



gSPBgRectlCyc 
gSPObj RectangleR 

gSPObj SubMatrix 



gSPObjLoadTxRectR 
gSPSelectBranchDL 



Precautions Regarding GBIs 

Changing Mode Using OtherMode 

When changing the mode in F3DEX with g [s] spsetotherMode, no more than a maximum of 31 bits 
could be set with a single g [s] spsetotherMode command. This has been corrected in S2DEX so that 
you can change 32 bits worth of parameters at once with a single command. 

has been £Drrecte^ 



S2DEX GBis 



Chapter 3 S2DEX GBIs 

The following paragraphs contain detailed descriptions of the GBIs available in S2DEX. 



BG Drawing GBI 



S2DEX can easily create vertical and horizontal scroll surfaces in a closed area {this function was 
included in the Super NES). Developing scroll games such as 2D Mario will be easier using this feature. 

uobjBg Structure 

uob j Bg structures hold the drawing information of BG. The pointer to this structure is given as the BG 
drawing GBI parameter. 

uobjBg structures can be precisely divided into 3 common structures. The first is for aligning the 
structure with the 8 byte boundary and does not require attention. The remaining 2 have data structures 
which adapt for the two BG drawing GBI structures described below. 

The structure that adapts for the BG drawing GBI resulting from the Copy Mode is uobj Bg_t and the 
structure that adapts for the BG drawing GBI resulting from the 1 Cycle Mode is uOBjScaleBg_t. 

typedef union { 

uObjBg_t b; 

uObjScaleBg_t s; 

long long int force_structure__alignment; 

} uObjBg; 

uObjBg_t Structure 

Members of the uobjBg_t structure can be divided into two groups (first half and second half). 

The first half consists of the member variables to be set by the user. BG drawing can be controlled by 
changing these variables. This first half can be shared with the uobj ScaleBg_t structure. 

The second half consists of the variables to be calculated and stored by the CPU to help the Microcode. 
These member variables are set by calling the function guS2DinitBg ( ) , using the uobjBg structure's 
pointer as the parameter. However, there is no need to call guS2DinitBg every time. 

Since the second halfs member variables can be derived from the first half variables (imageLoad, 
imageFmt, imagesiz, imageW, and frameW), guS2DinitBg needs to be called only immediately after 
these variables are changed. 

Using uobjBg as BG plane, these variables don't normally change very often. Therefore, it is usually 
sufficient to call guS2DinitBg once before using BG plane. 

However, when the uobjScaleBg_t structure's member variables scalew, scaleH, imagYorig have 
changed the uobjBg_t second half's member variables may be changed. In this situation, it will 
probably be necessary to call gus2DinitBg again. 

The following is the definition section of uobjBg In gs2dex.h. uobjBg's size is 40 bytes; and uobjBg 
must be aligned to 8 bytes. 

The first half member variables will be explained in the GBI section. Please understand that the 
arrangement of member variables is somewhat complicated to optimize RSP operation. 

lllllllj|fB|^^ 
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typedef struct { 

ul6 imageX; 
(ulO.S) 

ul6 imageW; 
(ul0.2) 

sl6 frameX; 
(sl0.2) 

ul6 frameW; 
(ulO.2) 

ul6 image Y; 
(ulO.5) 

ul6 imageH; 
(ulO.2) 

sl6 frameY; 
(310.2) 

ul6 frameH; 
(ulO.2) 

u64 * imagePtr; 

ul6 imageLoad; 

u8 imageFmt; 

u8 imageSiz; 

ul6 imagePal; 

ul6 imageFlip; 



// The x-coordinate of the upper-left position of BG image 

// The width of BG image 

// The upper-left position of the transfer frame 

// The width of the transfer frame 

// The y-coordinate of the upper-left position of BG image 

// The height of BG image 

// The upper-left position of the transfer frame 

// The height of the transfer frame 



// The texture address of the upper-left position of BG image 

// Which to use, LoadBlock and LoadTile 

// The format of BG image G_IM_FMT_* 

// The size of BG image G_IM_SIZ_* 

// The pallet number 

// Image horizontal flip. Flip using G_BG_FLAG_FLIPS. 



// All of the above are common with uObjScaleBg_t 

// The user doesn't have to set the following since they are set within the 

//initialization routine, guS2DInitBG() . 

ul6 tmemW; // The width of TMEM for 1 line's worth of the frame. The width 

// is the Word size. 

// At LoadBlock, GS_PIX2TMEM(imageW/ 4, imageSiz) 

//At LoadTile, GS__P 1X2 TMEM (f rameW/ 4, imageSiz) +1 

// The width of loadable TMEM at a time. (sl3.2) The height is 4 
// times value 

//At the normal texture, 512/tmemW*4 

//At the CI texture, 25 6/tmemW*4 

// The SH value 

//At LoadBlock, tmemSize/2-1 

//At LoadTile, tmemW*16-l 
ul6 tmemLoadTH; // The TH value or the Stride value 

// At LoadBlock, GS_CALC_DXT ( tmemW) 

//At LoadTile, tmertiH-1 

// The skip value of imagePtr for 1 line's v/orth of the image. 

//At LoadBlock, tmemW*2 

// At LoadTile, GS_PIX2TMEM(imageW/4, imageSiz) *2 

// The skip value of imagePtr for one loading. 

// = tmemSizeW*tmemH 

// 40 bytes 



ul6 tmemH; 



ul6 tmemLoadSH; 



ul6 tmemSizeW; 



ul6 tmemSize; 
} uObjBg t; 



The following structure defines the initialization function guS2DlnitBg. 

Void guS2DInitBg(uObjBg *bg) ; 

This function is used for initializing the uObjBg structure (uObjBg_t). If the uObjBg data structure is 
used as the parameter without initialization, the S2DEX Microcode's GB! may not function properly. 



Parameter: 



bg 



The pointer to the uObjBg structure. 
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uObjscaleBg_t Structure 

There is no member argument required to calculate the value by the CPU in advance like uob jBg t in 

members of the uobjScaleBg_t structure. AN member arguments are directly set by the user arTd BG 
plane drawing is then controlled accordingly. 

In addition, when shared by the uobjBg structure, the uObjscaleBg_t structure's member variables 

from imagex to imageFlip are shared with the uobjBg_t structure. 

typedef struct { 

(u^.S) 111139 ^'' N ThS x ~ coordinace of the upper-left position of BG image 

ul6 imageW; // The width of BG imaqe 
(ul0.2) 

(s 3 10 S 2) frameX; /; ThS up P er ~ left P°^ition of the transfer frame 

ul6 frameW; // The width of the transfer frame 
(ul0.2) 

(u U 10 6 .5) imageY/ 7/ ThS y- coordinate of ^e upper-left position of BG image 

ul6 imageH; // The height of BG imaqe 
(ulO.2) y 

(sl0 6 2) frameY/ 7/ ThS upper " left P° s ition of the transfer frame 

ul6 f rameH; // The height of the transfer frame 
(ulO.2) 

SlS im^To^ // S* t™ tm addXe3S ° f the "PFW-Lft position of BG image 
ul6 imageLoad; // Which to use, LoadBlock and LoadTile 

u8 imageFmt; // The format of BG image G IM FMT * 

u8 imageSiz; // The size of BG image G~IM~SIZ~* 

ul6 imagePal; // The pallet number - - _ 

// 6 A^ ma ? e !i ip '\ 7/ Image horizont al flip. Flipped using G BG FLAG FLIPS. 
// Ail of the above are common with uObjBg t _ _ _ 

ulS scaleW; // The scale value of the x-direction (u5 101 

U16 scaleH; // The scale value of the y-direction u5 10 

s32 imageYong; // The drawing start-point on image (s20.S) 

u8 padding [4]; 

} uObjScaleBg_t; // 4 bytes 

gSPBgRectCopy 

gSPBgRectCopy(Gfx *gdl, uObjBg *bg) 
gsSPBgRectCopy (uobjBg *bg) 

Gf * *gdl; The display list pointer 

u0b:)Bg * b 9'' The pointer to the drawing data structure of BG 

f 9 ea^urer 9ReCtCOPY ** ^ SlmpleSt BG drawlng GBIs supp,led by S2DEX - This GB[ has the following 
Scale change (magnifying / shrinking) is not possible. 
Scrolling in a closed region (making vertical / horizontal loop) is possible. 
Horizontal texture flipping is possible (not vertical texture flipping). 
Drawing is possible in copy mode only. 

Texture interpolation display and subpixei movement are not possible. 
Anti-aliasing is not possible. 
The GBI loads the texture data from DRAM to TMEM as necessary, then draws. 
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Designed for drawing in the copy mode, the biggest advantage of g [ s J SPBgRectCopy is that it has the 
fastest drawing speed. When using the GBI, CycieType must be set to the Copy mode. 

S2DEX sends data from the BG image buffer to the actual frame buffer's rectangle region, shown below. 
Scrolling becomes possible by establishing the relationship between the upper left hand comer of the 
frame buffer rectangle region (transfer frame) and a point in the BG image buffer, specified by imagex 
and imageY. imageX, imageY can be specified in the (u10.5) format, but due to restrictions when using 
the Copy mode, the values for imagex, imageY are limited to integer values. 



BG Image 



(ImageX, 
imageY) 



Color Frame Buffer 



image H 



imageW 



{frameX, frameY) 



Transfer 
Frame 



frameH 



frameW 



The size of the BG image is set by imagew and imageH. The beginning address (the top left hand 
comer) is specified by imagePtr. That is, you can consider the BG image to be a large texture data 
having width (imagew) and height (imageH) starting from imagePtr. 

BG image's width, imagew must be aligned to 8 bytes. Since the actual values used for imageW and 
imageH are in (u10.2) format, the values to be assigned must be multiplied by 4. The following chart 
shows the imagew's value constraints, taking (u10.2) format into consideration and multiplying by 4. 
There is no need to align imageH values. 

When G_iM_siz_4b : imagew is a multiple of 64 

imagew is a multiple of 32 

imagew is a multiple of 1 6 

imageW IS a multiple Of 8 

For horizontal scrolling, imagew must be larger than frameW. The following values take the (u10.2) 
format into consideration. When G_iM_siz_16b, imageW must be 4 pixels larger than f ramew. 

When G_IM_SIZ_4b : f rameW+64 <= imageW 

f rameW+32 <= imageW 

frameW+16 <= imageW 

f rameW+ 8 <= imageW 



When G_iM_siz_8b : 
When G_iM_siz_16b 
When G im siz 32b 



When G_iM_siz_8b : 
When G_iM_siz_16b 
When g im siz 32b 



The size of the transfer frame is specified by f ramew and frameH, and the position of the upper left 
hand comer of the transfer frame on the screen is specified by f rameX and frameY. The parameters of 
frameW and frameH are in (u10.2) format. It is possible to specify negative values for frameX and 
frameY. If the transfer frame projects out of the scissors box specified by g[s] DPSetScissor, the 
microcode will clip the excess portion. 

A problem is not created when the BG frame is bigger than the transfer frame; however, if the transfer 
frame is bigger than the BG frame, proper operation may not occur. Please be sure to keep a transfer 
frame smaller than a BG image. 
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In addition, the right and left ends of BG image are offset in the Y direction by 1 . Specifically, a BG 
image's right end pixel is at (imagew-1, n), and one pixel to the right is (0, n+1). This arrangement is 
necessary to improve RDRAM access efficiency for loading texture, it is very important for application 
developers to keep this in mind. 

Texture format and size for a BG image will be set by specifying imageFmt and imagesiz using the 
macros: g_im_fmt_*, and g_im_siz_*, respectively. Also, when using CI4 texture, assign TLUT 
number to imagePal. 

There are two ways to load texture for a BG image-using LoadBlock and using LoadTile. Since 
there are advantages and disadvantages for each method, S2DEX's GBI design allows the user to select 
the proper method by setting a member variable (imageLoad). Depending on the situation, the user can 
assign an appropriate value (G_BGLT_*) to imageLoad to use LoadBlock or LoadTile. 

The value of imageLoad Meanings 

G_BGLT_LOADBLOCK Use LoadBlock 

G_BGLT_LOADTILE Use LoadTile 

When using LoadBlock, maximum performance can be gained under certain circumstances. However 
when certain conditions are not satisfied, LoadBlock can not be used because processing overhead will 
become too large. On the other hand, LoadTile can always perform at a certain level. We 
recommend using LoadBlock when the maximum benefit is expected, and use LoadTile in other 
cases. 

LoadBlock's use is limited by the width of BG, When imagesiz is 16 bit, the possible values of 
imagew usable for LoadBlock are the following: 

4, 8,12,16,20,24,28,32,36,40, 

48, 64, 72, 76,100,108,128,144,152,164, 

200,216,228,256,304,328,432,456,512,684, 

820,912 

When imagesiz is 8 bit long, the usable set of numbers for imagew can be obtained by doubling each 
of the numbers above. Similarly, multiply each number by 4 when imagesiz is 4 bit, and multiply each 
number by 1/2 when imagesiz is 32 bit. This is consistent with the chart in the N64 Programming 
Manual, Chapter 12, Appendix A, "LoadBlock Line Limits". If the width of the BG image does not allow 
the use of LoadBlock, LoadTile must be used. 

In order to draw a transfer frame Sine by line, LoadBlock reads the entire line of the corresponding BG 
image. Since scrolling BG requires a larger BG image for BG refresh, imagew must be greater than 
frameW. For this reason, excess data will be loaded when using LoadBlock. 

On the other hand, LoadTile loads necessary data only. Since the processing speed of LoadBlock is 
faster than that of LoadTile, using LoadBlock is advantageous when the difference of loaded data is 
only a few pixels. However, when imagew is much larger than framew, the processing overhead could 
become too high. The use of LoadTile is advantageous in this case. The user should choose the 
command best suited for the given application. 

As an example, let's assume we are using BG to cover the entire screen (320 X 240). 

Since the transfer frame is the entire screen, f ramew becomes 320 pixels. Reserving 8 pixels for the 
BG refresh area, imagew is 328 pixels. In this case, the difference between f ramew and imagew is 
small; and using LoadBlock at 328 pixels is the best solution. 

The GBI supports BG image flipping for the horizontal direction only. A texture image can be flipped by 
assigning g_bg_flag_flips to imageFlip. Assign for normal display (no flipping). 
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gSPBgRectlCyc 

gSPBgRectlCyc (Gfx *gdl, uObjBg *bg) 
gs5PBgRectlCyc(uObjBg *bg) 

Gfx *gdl; The display list pointer 

uObjBg *bg; The pointer to the BG drawing data structure 

g[s]SPBgRectlCyc is one of the BG drawing GBIs provided by S2DEX, whereby the BG screen can be 
enlarged or reduced. The features of this GBI are listed below. 

Scale change (magnifying / shrinking) is possible. 

Scrolling in a closed region (making verticai / horizontal loop) is possible. 

Horizontal texture flipping is possible (not verticai texture flipping). 

Drawing in 1 Cycle mode only. 

Texture interpolation display is possible, subpixel movement is possible only in the 
horizontal direction. 

Anti-aliasing is not possible. 

The GBI loads the texture data from DRAM to TMEM, then draws. 
Matev ' TOsGBf can riot he usec^hT Copy mode. . 

The parameters necessary for drawing with g[s] sPBgRectiCyc are the parameters required when 
using g[s] sPBgRectCopy, discussed previously, plus the parameters scalew, scaleH, and 
imageYorig. The additional parameters will be explained here. 

The biggest difference between g[s] SPBgRectiCyc and g[s] SPBgRectCopy is that it supports BG 
scaling. BG scaling is controlled by the uob j scaleBg_t structure's member variables scalew and 
scaleH. This scaling is centered at the BG image's (imagex, imageY). 

In other words, even when scaling has been performed, BG image's (imagex, imageY) are drawn at the 
position of (framex, frame y) in the frame buffer, just as if scaling had not been done. (However, if 
horizontal flipping has been performed, they are drawn at the position, (f raraex+f ramew-i, frame y). 

In addition, when magnifying, the image is clipped by the frame size. Conversely, when shrinking, the 
frame is sometimes clipped by image size. Refer to the S2DEX sample program for more about this. 

However, frame dipping during shrinking can sometimes be slightly greater or lesser depending on 
calculation error. When a precise size is required, calculate and set the values for framew and frameH 
on the CPU side. 

Bilinear interpolation display is supported by g[s] SPBgRectiCyc. When using bilinear interpolation 
display, jagged lines in texefs become less apparent in magnification compared with norma! point 
sampling display, giving a smoother appearance. However, this effect is less apparent in images which 
are scaled down in size. 

When bilinear interpolation is used, the RDP drawing performance decreases compared to when it is not 
used. The rate of this decrease in performance is greater when a smaller number of image lines are 
loaded in TMEM at one time. When drawing a 320X240 image in a 320X240 frame with no scaling is 
compared to drawing a 640x480 image at 1/2 reduction, the share of overhead taken by using bilinear 
interpolation will be greater when shrinking the 640x480 image. This causes a substantial drop in 
performance when a 640x480 image is similarly reduced and displayed using point sampling. 
Considering that the effects of bilinear interpolation diminish when used in reducing images, as 
discussed above, you should probably consider switching to point sampling display when reducing an 
image. 
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g[s] spBgRectlcyc draws an image by automatically dividing it into several subplanes, but it is 
possible that the drawing result will unexpectedly develop unnatural wrinkles during the division process 
if the division is done carelessly. This is especially noticeable when the image is scrolled. The member 
variable imageYorig has been provided for uObjScaleBg_t to prevent these wrinkles. The value of 
imageYorig refers to the Y coordinate of the origin for scaling, but it also describes the division origin of 
a subplane. it is thus possible to prevent the wrinkles described above. Typically, imageYorig is used 
in the following situations. 

At initialization: 

Set the value Of imageY to imageYorig. 
When the value of scaleH changes: 

Set the value Of imageY to imageYorig. 
When imagex and imageY have been wrap processed: 

Perform the same processing that was performed in imageY on imageYorig. 
When changing oniy imageY (change not accompanying wrap processing): 

Do not change imageYorig. 
Based on the above, processing for an image which is being scrolled by dx and dy would be as follows. 

/♦Addition of scroll values. */ 

bg~>3 . imageX += dx; 
bg->3. imageY += dy; 

/* Wrap processing of the screen edge. */ 
if (bg->s. imageX < 0) { 

bg->s . imageX += bg->s . imageW; 

bg->s. imageY -= 32; 

bg->s .imageYorig -= 32; 
} 
if (bg->s . imageX >~ bg->s.imageW) { 

bg~>s . imageX -= bg->s.imageW; 

bg->s. imageY += 32; 

bg->s . imageYorig += 32 ; 
} 
if (bg->s. imageY < 0) { 

bg->s. imageY += bg->s . imageH; 

bg->s . imageYorig += bg->s .imageH; 
} 
if (bg->s. imageY >= bg->s . imageH) { 

bg~>s. imageY -= bg~>s. imageH; 

bg->s . imageYorig -= bg->s. imageH; 
} 

BG images can be flipped in the horizontal direction only with this GBI and functions just like it does in 
the COPY mode. The texture image can be flipped by substituting g_bg__flag_flips for the member 
variable imageFlip. For normal display (no flipping) substitute 0. 

When using this GBI, there are limitations on the value of the uObjScaleBg_t structure's member 
variable, imageptr. Any position from the head of RDRAM to the 4096 byte position cannot be 
specified as the value for imageptr. This represents physical addresses 0x00000000 to OxOOOOOfff, in 
which range imageptr (after segment conversion) cannot be placed. Please keep this in mind. 

This GBI is built into S2DEX Version 1.00 and later. 

In addition, the function guS2DEmuBgRectlCyc has been added, beginning with S2DEX Version 0.75. 
This function emulates processing which is equivalent to gSPBgRectlcyc by combining several GBIs, 
such as gSPTextureRectangie, etc. This can also be used for performing scaleable BG drawing. 
See Chapter 4 for details. 
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The Sprite Drawing GBI 

The sprite mentioned here corresponds to OBJECTS in Super NES programming. Sprites have been 
used for drawing areas smaller than BG, and historically they have been used as "player characters" 
quite often. In S2DEX, magnifying / reducing, and rotation of sprites are al! possible. Also using sprites, 
more natural expression is possible due to the use of bilinear interpolation processing. 

To support a sprite's rotation, a two dimensional coordinate conversion matrix is used. By setting the 
matrix's elements, a sprite can be rotated freely. The matrix must be set before drawing a sprite. Also, 
unlike the matrix for Fast3D or F3DEX, there is no matrix stack; so Push/Pop operation can not be 
performed. Matrix multiplication can not be done either. Only the load operation is possible. (Please 
refer to, "2D Matrix Operation" on page 21.) 

S2DEX specifications call for using separate GBIs for TMEM loading and sprite drawing, in other words, 
before drawing a sprite, the texture used for the sprite must already be loaded using the texture load GBI 
(Please refer to, "Texture Load GBI" on page 24.). 

The sprite drawing mode can be divided into two categories, rotating sprites and non-rotating sprites. 
For each respective case, the corresponding GBI will do the processing. 

• The Drawing Mode Corresponding GBI 

• No Rotation g[s] SPObj Rectangle, g[s] SPObj RectangleR 

• Rotation g[s] SPObj Sprite 

uobj sprite Structure 

The uobj Sprite data structure holds a sprite's information. The pointer to the data structure wiii be 
given to the sprite drawing GBI as a parameter. 

typedef struct { 

sl6 objX; // The x- coordinate of the upper- left end of OBJ. (sl0.2) 

ul6 scaleW; // The width of direction scaling. (u5.10) 
ul6 imageW; // The width of the texture. (The length of the S 
//.direction.) (ul0.5) 

ul6 paddingX; // Unused. Always 0. 

si 6 objY; // The y-coordinate of the upper-left end of OBJ. (sl0.2) 

ul6 scaleH; // Scaling of the height direction. (u5.10) 

ulS imageH; // The height of the texture. (The length of the T 

// direction.) (ulO.5) 
ul6 paddingY; // Unused. Always 0. 

ul6 imageStride; // The folding width of the texel. {In units of 64-bit word.) 
ul6 imageAdrs; // The texture starting position in TMEM. (In units of 64-bit 
// word.) 

u8 imageFmt; // The format of the texel. G_IM_FMT__* 

u8 images iz; // The size of the texel. G_IM_SIZ_* 

u8 imagePal; // The pallet number. 

u8 imageFlags; // The display flag. 

} uObj Sprit e_t; // 24 bytes 

typedef union { 

uObj Sprit e_t s; 

long long int forceps t rue ture_alignment/ 

} uObj Sprite / 

Although the sequence of member variables is somewhat complicated, this is unavoidable to optimize 
RSP processing (same as with uobjBg). 



16 



S2DEXGBIS 



uobjMtx/uobjsubMtx Structures 

S2DEX Microcode has the area to hold a 2D matrix for controlling a Sprite's rotation. There are eight 

parameters (A, B, C, D, X, Y, BaseScaleX, and BaseScaleY), 

uob jMtx data structure has one-to-one correspondence to this 2D matrix area, and the structure is used 
for modifying the whole 2D matrix. Rotation operation using the 2D matrix is explained in 

"gSPObj Sprite" on page 20. 



{ 



D; 



typedef struct 
s32 A, B, C, 
sl6 X, Y; 
ul6 BaseScaleX; 
u!6 BaseScaleY; 
} uObjMtx t; 



sl5.16 

sl0.2 
u5.10 
u5.10 



24 bytes */ 



typedef union { 

uObjMtx_t m; 

long long int f orce_structure_alignment; 

} uObjMtx; 

uObjSubMtx is a subset of uObjMtx, and is used for changing x, y, Bases calex, and 
BaseScaleY. The main useforuobjsubMtx is drawing a sprite using g[s]spobjRectangleR. 
Please refer to "gSPObjRectangleR" on page 19 for details. 

typedef struct { 
sl6 X, Y; 
ul6 BaseScaleX; 
ul6 BaseScaleY; 
} uObjSubMtx_t; /* 8 bytes */ 

typedef union { 

uObjSubMtx_t m; 

long long int f orce__structure_alignment; 

} uObjSubMtx; 

The eight elements of a 2D matrix (A, b, c, d, x, y, BaseScaleX, and BaseScaleY) can be 
referenced by g[s] spobj sprite and g[s] spRectangleR. However, not all 8 elements are actually 
referenced (please refer to the chart below), x, and y are referenced by both. 



/* 


sl0.2 


V 


/* 


U5.10 


*/ 


/* 


U5.10 


*/ 


/* 


8 bytes * 



pRef erred by g[s] 

A, B 
C, D 


SPObj Sprite — i 




X, Y 


BaseScaleX 
BaseScaleY 

g [ s ] SPObj RectangleR- 




—Referred by 



gSPObjRectangle 

gSPObjRectangle (Gfx *gdl, uObjSprite *sp) 
gsSPObj Rectangle (uObj Sprite *sp) 

Gfx *gdl; The display list pointer. 

uObj Sprite *sp; The pointer to the structure of the sprite drawing data. 

g[s] spobj Rectangle is one of the sprite drawing GBIs supplied by S2DEX and used for non-rotating 
sprite drawing. The process inside the RSP is to create the TextureRect angle command from the 
input uObj sprite structure data and send it to the RDP. 
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The g[s]SPObjRectangle GBI draws texture for the rectangle area defined by the upper left hand 
corner screen coordinate (objx, obj Y), and lower right hand corner screen coordinate 
(objx+imagew/scalew-l, objY+imageH/scaleH-i). The drawn texture region will be defined by 
upper left hand corner (o, o) and lower right hand comer (imagew-i, imageH-l). If scalew and 
scaleH are 1 « 10, texture will be drawn with equal proportions, without scaling. Please refer to the 
following page. 



TMEM. 



(0,0 
imageAdrs->X • 



Texture 
area 






|imageW-l, 
imaaeH-1) 



Frame Buffer. 



objX,objY: 

X 



Sprite area 



|objX+imageW/scaleW-l, 
ob j Y+imageH/scaleH-1 ) 



Also, when a sprite is drawn, the scissors box defined by gDPSetScissor is referenced, and automatic 
drawing area clipping is done. Therefore, it is possible to set negative values for objx and ob jY. 

The TMEM address corresponding to the origin of texture region (0,0) can be specified by imageAdrs. 
Normally, imageAdrs is set as the beginning of the TMEM loading location specified by the texture load 
GBI. It is convenient to use the gs_pix2TMEM() macro for this operation. gs_pix2TMEM() , which is 
defined in gs2dex.h, is the macro used to convert a pixel unit number to a TMEM address number. 

• GS_PIX2TMEM(pix,siz) 

• pix: The number of pixels 

• siz: The size of 1 texel. Specified by G_IM_S!Z_* 

The horizontal width (folding width) at the time of texture load is assigned to imagestride. The reason 
for this is that sometimes the loaded texture width and the imagew of the actual sprite drawn are 
different. Since this is also specified in the TMEM address unit, GS_PIX2TMEM( ) can be used. 

An application using imageAdrs and imagestride is introduced, as follows. Load the multiple of 
small texture (subtexture) in TMEM first. Now the user can choose the appropriate drawing texture by 
setting the imageAdrs as shown below. 

imageW - (sub-texture width) ; 
iraageH = (sub- texture height) ; 

imageAdrs = GS_PIX2TMEM( (S-coordinate in TMEM) + (T-coordinate in TMEM)*" 

(texture width at load time) ,G_IM_SIZ_*) ; 
imagestride = GS_PIX2TMEM{ texture width at load time); 

More specifically, prepare a large texture consisting of 4 textures, as follows: 



<r 
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Load this composite texture as a 64 x 32 texture; and when drawing a sprite, specify each texture as 
follows: 

Sub-texture A: imageW - 32; 

imageH = 32; 

imageAdrs = GS_PIX2TMEM( 0*64+0, G_IM_SIZ_16b) ; 

imageStride = GS_PIX2TMEM(64, G_IM_SIZ_16b) ; 

Sub- texture B: imageW = 16; 
imageH - 16; 

imageAdrs = GS_PIX2TMEM( 0*64+32, G_IM_SIZ_16b) ; 
imageStride = GS_PIX2TMEM(64, G_IM_SIZ_16b) ; 

Sub-texture C: imageW = 16; 
imageH = 16; 

imageAdrs = GS_PIX2TMEM( 0*64+48, G_IM_SIZ_16b) ; 
imageStride = GS_PIX2TMEM( 64, G_IM_SIZ_16b) ; 

Sub-texture D: imageW = 32; 
imageH = 16; 

imageAdrs = GS_PIX2TMEM( 16* 64+32, G_IM_SIZ_16b) ; 
imageStride = GS_PIX2TMEM(64, G_IM_SIZ_16b) ; 

There is a limitation to this method however. The format for storing data at TMEM is different for an odd 
numbered line and an even numbered line, in the calculation formula for imageAdrs (T coordinate in 
TMEM), you can not specify an odd number value. 

When using g [s] spobj Rectangle, the format and size of the texture is specified by setting imageRnt 
and imagesiz using the macros g_im_fmt_*, and g_im_siz_*. Also, if CI4 texture is used, specify 
imagePal using TLUT number. 

g[s] spobj Rectangle supports texture pattern flipping in the s and T directions. The drawing direction 
can be changed by setting the following values. 

Value of imageFlags Drawing Effect 

No flipping 

G_OBJ_FLAG_FLIPS The inversion of the S direction (X) 

G_OBJ_FLAG_FLIPT The inversion of the T direction (Y) 

G_OBJ_FLAG_FLIPS|G_OBJ_FLAG_FLIPT The inversion of the S (X) and T (Y) 

directions 

g[s] spobj Rectangle can be used for 1 cycle, 2 cycle, and copy modes. Drawing speed using copy 
mode is faster than other modes; however, there are more drawing restrictions using copy mode. 

Copy mode does not support bilinear interpolation, subpixei processing, and enlarging/reducing in the X 
direction. If these operations are attempted in copy mode, they may not be performed properly. In the 
worst case, the RDP may become uncontrollable. We recommend selecting the proper mode to perform 
necessary functions. 

The drawing result using g[s] spobj Rectangle will vary depending on the render mode, such as; 
bilinear interpolation, etc. Please refer to, "Setting the Object Render Mode" on page 22 for details. 

g[s] spobj Rectangle does not reference the 2D matrix setting. For this reason, the 2D matrix setting 
does not affect this GBI's drawing result. 
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gS POb j Rec tangl eR 

gSPObj RectangleR (Gfx *gdl, uObj Sprite *sp) 
gsSPObjRectangleR(uObj Sprite *sp) 

Gfx *gdl; The display list pointer 

uObj Sprite *sp; The pointer to the structure of the sprite drawing data 

g[s]SPObjRectangleRis one of the sprite drawing GBIs provided by S2DEX. Like 
g [ s ] spobj Rectangle, g [s ] spob j RectangleR is used for drawing non-rotating Sprites. Unlike 
g[s] s POb j Rectangle however, g[s] spobj RectangleR changes drawing screen coordinates by 
referring to the 2 D matrix. 

G[s]SPObjRectangleR refers to x, y, BaseScalex, and BaseScaleY in the 2D matrix, and 
determines the vertex coordinates of a sprite using the following formula. 

Upper-left hand coordinate ( X + objX / BaseScaleX, Y+objY/BaseScaleY ) 
Lower-right hand coordinate ( X + (objX + imageW / scaleW) / BaseScaleX - 1, 

Y + (objY + imageH / scaleH) / BaseScaleY - 1 } 

To Change the values in {X, Y, BaseScaleX, BaseScaleY}, use the g[s] SPObjSubMatrix GBI. 
Whenx = Y = and BaseScaleX = BaseScaleY = l. o, the result is the same as using 
g[s] SPObj Rectangle. 

By changing the values in {x, y, BaseScale X, BaseScale y} of the 2D matrix, multiple Sprites 
can be moved or their scale changed, as if they were one sprite. 

For example, consider the arrangement of the three Sprites A, B, and C in the following example: 

32 32 32 

32 



A 


B 


C 



and set the (ob jx, ob j y) data as follows. 



iobjX, objY) = ( 0«2, 0«2 
iobjX, objY) = (32«2, 0«2 

iobjX, objY) ~ (64«2, 0«2 



Now, by changing X and Y in this example, the three Sprites will move as one sprite. 

However, because of a calculation error (performing multiplication for example) sometimes gaps are 
created between A and B or between B and C. To solve this problem, the adjacent Sprites are slightly 
overlapped (see below). 

B: (objX, objY) = ((32«2)-2, 0«2) 
C: (objX, objY) = ((64«2)-4, 0«2) 

This completes the explanation of the differences between g [s 3 spob j RectangleR and 
g[s ] spobj Rectangle. For other features of g[s] spobj RectangleR, please refer to 

"g[s 3 SPObj Rectangle" on page 17. 

gSPOb j Sprite 

gSPObj Sprite (Gfx *gdl, uObj Sprite *sp) 
gs S POb j Spri te (uObj Sprite *sp) 

Gfx *gdl; The display list pointer 

uObj Sprite *sp; The pointer to the structure of the sprite drawing data 
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g [ s ] spobj Sprite is one of the sprite drawing GBls provided by S2DEX. This GBI is used for drawing 
rotating sprites. To rotate a sprite, use {a, b, c, d, X, Y} of the 2D matrix. g[s]spobjMatrix is 
used for setting these elements of the 2D matrix. (Please refer to "gSPObjMatrix" on page 21.) 

A point (x, y) on a non-rotating sprite will move to the point (x' , y') by performing 2D matrix 
multiplication as follows. 

x'=A*x+B*y+X 
y'=C*x+D*y+Y 

Each vertex of the sprite will move, and the sprite is drawn in the new region defined by the new vertices. 
If the 2D matrix {a, B, c, D}is defined by the rotation matrix as follows, a sprite wiil make a T rotation. 



A B 
C D 



cosT sinT 
- sinT cosT 



In this case, a sprite will rotate centering around the screen coordinate (x, Y). If scaling is to be added, 
multiply each element {A, B, C, D} by the scale value. 

By changing (objx, objY), the rotation center of a sprite (x,y) can be changed. If objx=objY=o, a 
Sprite's rotation center will be the upper [eft hand vertex. If you wish to rotate a sprite about its center, 
set objx, and objY as follows. 

ObjX « -{imageW/scaleW) /2; 
objY ~ - (imageH/scaleH) /2; 

Also, similar to g[s]spobjRectangleR, by adjusting the values of objx and objY, multiple Sprites 
can be rotated as if they were one sprite. Here, as with g[s]SPObjRectangleR, we recommend 
drawing Sprites in a slightly overlapping fashion to eliminate gaps caused by calculation errors. 

By setting (a = d = 1.0, B = c = 0.0), a non-rotating sprite's location will coincide with a sprite 
drawn with g[s]spob jRectangleR by setting Bases calex = BaseScaleY = l.o. We recommend 
drawing a non-rotating sprite with g[s]spobj Rectangle, and using g[s] spobj Sprite for rotating 
Sprites. Since g[s] spobj Sprite uses two polygons in combination for drawing, it requires more 
RSP/RDP processing than using g[s]spob jRectangleR. 

Also, when using g[s] spobj sprite for a non-rotating sprite, a magnified sprite drawing may not 
coincide with the drawing done by g[s] spobj Rectangle. This is unavoidable since the drawing 
methods are different (polygon combination vs. rectangle drawing). 

The setting for the texture to be placed on a sprite is the same as g [s] spobj Rectangle. Please refer 
to the appropriate section above. 

2D Matrix Operation 

As mentioned above, S2DEX Microcode uses a 2D matrix as the drawing parameter. Several GBls are 
provided for the purpose of modifying this 2D matrix. 

gSPObjMatrix 

gSPObjMatrix {Gfx *gdl, uObjMtx *mtx) 
gsSPObjMatrix(uObjMtx *mtx) 

Gfx *gdl; The display list pointer 

uObjMtx *mtx; The pointer to the 2D matrix structure 

Load the 2D matrix parameter in the uObjMtx structure to the 2D matrix area in the RSP. Usually, this 
GBI is used for a rotating sprite. 

Since only 6 matrix elements (a, b, c, d, x, y) are needed for rotation processing, it appears that 
there is no need to transfer the entire 2D matrix. However, 24 bytes including {BaseScaleX, 
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BaseScaleY} are transferred, because an 8 byte unit must be maintained for transfer from main 

memory to the RSP matrix region. 

For this reason, the vaiues of BaseScaleX and BaseScaleY are always overwritten. If you are not 

using these parameters (not using g[s] SPObjRectangleR immediately after calling gSPObjMatrix), we 
recommend assigning the default value of 1024 (1.0 for s5.10 format) to BaseScaleX and 

BaseScaleY. 

gSPOb j SubMatrix 

gSPObjSubMatrix(Gfx *gdl, uObjMtx *mtx) 
gsSPObjSubMatrix (uObjMtx *mtx) 

Gfx *gdl; The display list pointer 

uObjSubMtx *mtx; The pointer to the 2D matrix structure 

g[s] spobjSubMatrix loads the data in the uobjsubMtx structure to the 2D matrix region of the RSP. 
However, the uobjsubMtx structure is a subset of uObjMtx, and holds the values of 2D matrix 

elements{X, Y, BaseScaleX, BaseScaleY} used by g[s] SPObjRectangleR. 

This GBI changes 2D matrix elements {x, Y, BaseScaleX, BaseScaleY} corresponding to the 
variable of uObjSubMtx structure only, and it does not affect the values in {a, b, c, d}. 

This GBI is used mainly in conjunction with g[s] SPObjRectangleR. 

Setting the Object Render Mode 

Many drawing parameters exist in the RDP, which control sprite/BG drawing. Depending on the RDP 
mode, polygon drawing and rectangle drawing processes are affected in some subtle ways. For 
example, by setting bilinear interpolation on and off, texture coordinates will vary by 0.5. S2DEX 
Microcode has been designed to correct these effects at the RSP to minimize the user's efforts to get 
around these problems. The RSP's correction process corresponds to the RDP's mode. We call the 
RSP's correction mode "Object render mode" (or OBJ render mode). 

Automatic selection of this mode will increase the processing overhead of the RSP; so currently Copy 
Mode and 1 ,2CycleMode have the benefit of automatic operation. For other modes, it is necessary to let 
the RSP know in the form of the GBI. The current Object render mode has an independent rendering 
function, in addition to the capability to correct the effects caused by changing the RDP's mode. See the 
next paragraph for the details. 

gSPObjRenderMode 

gSPObjRenderMode (Gfx *gdl, u32 mode) 
gsSPObjRenderMode (u32 mode) 

Gfx *gdl; The display list pointer 

u32 mode; The Object render mode 

g [s] spob jRenderMode is used for changing the Object render mode of the RSP. Usually, Object 
render mode is set based on the display mode. 

The flags used are shown below. If multiple settings are required, connect the conditions using the OR 
operator. However, g_objrm_shrinksize_i and g_objrm_shrinksize_2 can not be used at the 
same time. 

Macro Name Function 

G_OBJRM_NOTXCLAMP does not perform clamp operation for peripheral part 

of the texture 
G_OBJRM_BILERP switches to on for bilinear interpolation 

G_OBJRM_SHRINKSIZE_l cut 0.5 texel around the image 
G_0BJRM_SHRINKSIZE_2 cut 1.0 texel around the image 
G_OBJRM_WIDEN expand the image by 3/8 texel 
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Each flag is explained in detail below: 

G^OBJRM.NOTXCLAMP 

To place texture on a sprite, the following relationships exist among texture size (imagew and imageH), 
scale values (scalew and scaleH), sprite size, (objw and objH) , 

objW = imageW / scaleW"; 
objH = imageH / scaleH 

When placing texture on the sprite, the region (0,0)- (imageW-1, imageH-1) in the texture 
coordinates will be displayed on the sprite. However, sometimes texture slightly outside of this region 
may be displayed, exceeding the outermost edge of the sprite. 

To prevent this from occurring, the RSP performs a clamping operation for the excess texture outside of 
the defined region. For details on this clamping operation, please refer to Chapter 12 of the N64 
Programming Manual, "Texture Mapping". 

The flag g_objrm_notxclamp causes the RSP not to perform this clamping operation. Normally it is 
not necessary to set this flag to "ON". 

G_OBJRM_B!LERP 

This flag is set when using texture bilinear interpolation. As we have explained above, the texture 
discrepancy of 0.5 due to bilinear interpolation will be corrected by setting this flag. 

Also, when this flag is ON, the RSP supports internal image movement by subpixei units, using bilinear 
interpolation. As a result, a sprite can be moved by 1/4 pixel units. 

G_OBJRM_SHRINKSIZE_1 

When combining multiple bilinear interpolated Sprites and treating them as one large bilinear 
interpolated sprite, care must be taken to assure continuity of the images at boundary lines. To maintain 
the continuity between the images, it is necessary to overlap each Sprite's texture by one line. If this is 
done, 0.5 texel (denoted by # in the chart below) from outer edge will become unnecessary, since this 
portion will be covered by the adjacent sprite. 



0.5 1 




0.5 

1 
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When the flag g_objrm_shrinksize_1 is ON, the RSP will shrink the Sprite's drawing image by 
eliminating 0.5 texel, and draw the texture image. The texture image will shrink by 0.5, but the upper left 
hand corner coordinate will not change. The resultant drawing becomes: 



(objX.objY) 



1/scaleX 




G OBJRM SHRINKSIZE 1 ON 



G OBJRM SHRINKSIZE 1 OFF 



1 /scale Y 



G„OBJRM_SHRINKSIZE_2 

This is similar to g_objfm_shrinksize_i. The only difference is that the amount of image shrinkage 
is doubled (1 texel from the outer edge). 

This flag is used for overlapping adjacent Sprites' texels by two lines for better continuity for subpixel 
processing. 

G_OBJRM_WiDEN 

This expands the image by 3/8 texel in the positive s, T directions. 

This flag is used to prevent blank spaces from opening at the seams when Sprites are combined to 
display a rotating Object which is larger than TMEM. 

The importance of this flag was decreased as calculation for sprite rendering is processed more precisely 
with S2DEX Version 1.04 and later, however this flag is still usable. 

• G_OBJRM_ANTIALIAS 

• G_OBJRM_XLU 

RenderMode when Drawing Sprites 

The RenderMode of the RDP which needs to be set for rendering a sprite is defined in a header file, 
gs2dex . h. Please use this when rendering a sprite. 

For Anti-aliasing off: 

G_RM_SPRITE* 

G RM XLU SPRITE* 



Opaque sprite 
Semi-transparent sprite 
For -aliasing on: 

Opaque sprite 
Semi-transparent sprite 



G_RM_AA_SPRITE* (G_RM_RA_SPRITE*) 
GRMM XLU SPRITE* 



When a semi-transparent sprite is used for Anti-aliasing On, and two sprites are layered, sometimes the 
edge portion of the sprite which is layered on the bottom may affect the edge portion of the sprite on top. 
Since this is inevitable, please use g_RM_XLU_sprite if this is unacceptable. 

The Texture Load GBI 

The sprite drawing process for S2DEX was described in the sprite GBI section. Here, we will describe 
the TMEM load process, which is another important operation. 
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uobjTxtr Structure 

In the Texture Load GBI, three different texture types are processed by the same GBI. These three 
different types (methods) are distinguished by uob jTxtr structure's member variable type, which is 
provided to the GBI. These three methods are shown beiow. 

1. Texture load using LoadBlock 

2. Texture load using LoadTile 

3. TLUTioad 

Texture load using LoadBlock can be faster than texture ioad using LoadTile; however, there is a 
limitation to loadable texture width. Since this limitation is the same as "LoadBlock"; please refer to page 
13 for details. 

Corresponding to the three different methods, three different data structures are defined. These data 
structures are constructed the same way, having different member variable names. These data 
structures are combined into a union (uobjTxtr structure). 

1. Texture load structure uob jTxtrBlock_t for using Loadblock 

struct { 

// by type G_OBJLT_TXTRBLOCK 

; // texture source address on DRAM 

// TMEM word address of loading destination (8byteW0RD) 
// texture size specified by macro GS_TB_TSIZE () 
// texture width specified by macro GS_TB_TLINE ( ) 



typedef 


u32 


type; 


u64 


* image 


u!6 


tmem; 


ul6 


t s i z e ; 


ul6 


tline; 


ul6 


sid; 


u32 


flag; 


u32 


mask; 



// Status ID { 0, 4, 



or 12 } 



// 
// 
// 



Status flag 
Status mask 
24 bytes 



typedef si 


u32 


type; 


u64 


* image ; 


ul6 


tmem; 


ul6 


twidth; 


ul6 


theight; 


ul6 


sid; 


u32 


flag; 


u3 2 


mask; 



} uObjTxtrBlock_t; 

2. Texture ioad structure uob jTxtrTile_t for using LoadTile 
struct { 

//by type G_OBJLT_TXTRTILE 

// texture source address on DRAM 

// TMEM word address of loading destination (8byteWORD) 
// Texture width specified by macro GS_TT_TWIDTH ( ) 
// Texture height specified by macro GSJTTJTHEIGHT { ) 
// Status ID { 0, 4, 8, or 12 } 
// Status flag 
// Status mask 
} uObjTxtrTile_t; // 24 bytes 

3. TLUT load structure uObjTLUT_t 

struct { 

//by type G_OBJLT_TLUT 

// texture source address on DRAM 

// first TLUT area number 25 6 < phead < 511 
// number of TLUT to be loaded - 1 

// always 

// Status ID { 0, 4, 8, or 12 } 

// Status flag 

// Status mask 

// 24 bytes 



typedef • 


u32 


type; 


u64 


* image; 


ul6 


phead; 


ul6 


pnum; 


ul6 


zero; 


ul6 


sid; 


u32 


flag; 


u3 2 


mask; 



} uObjTxtrTLUT t; 
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The shared structure, uobjTxtr union 

typedef union { 

uObjTxrrBlock_t block; // texture load parameter using LoadBlock 

uObjTxtrTile_t tile; // texture load parameter using LoadTile 

uObjTxtrTLUT_t tlut; // TLUT load parameter 

long long int f ore e__s true ture_alignment ; 
} uObjTxtr; 

gSPObjLoadTxtr 

gSPObjLoadTxtr (Gfx *gdl, uObjTxtr +tx) 
gsSP0bjLoadTxtr(uObjTxtr *tx) 

Gfx *gdl; The display list pointer 

uObjTxtr *tx; The pointer to the texture load data structure 

gSPObjLoadTxtr performs each loading operation by referring to the texture loading parameters which 
are held by the above-mentioned three structures. The three structures have the common member 
variables type, image, sid, flag, and mask. First, we will explain these five common member 
variables. 

type 

gSPObjLoadTxtr distinguishes each structure using the value of type, the structure's member 
variable. Each value of type and corresponding structure, and each operation is shown below. 

type Value Structure Operation 

G_OBJLT_TXTRBLOCK uOb jTxtrBlock_t texture load using LoadBlock 
G_OBJLT_TXTRTILE uObjTxtrTile_t texture load using LoadTile 
G_OBJLT_TLUT uObjTLUT_t loading Of TLUT 

image 

image, the member variable, specifies the texture data in the main memory to be loaded, or tlut data 
address. This texture data must be 8 byte aligned. 

sid, flag, and mask 

These three member variables are used for bypassing the reloading operation if the texture in question is 
already loaded, if the requested texture is already loaded, g[s] SPObjLoadTxtr will not perform the 
load operation. 

To determine the existence of the texture in question in TMEM using the RSP, the RSP must analyze the 
loading destination area for each texture load operation. This is time consuming, and not a very good 
option. 

In S2DEX, the loading destination area data are included in texture data structure. Therefore, rather 
than performing analysis using the RSP, simple calculation will determine whether or not the loading 
operation needs to be performed. 

For example, when texture data are loaded to TMEM, an ID which corresponds to the loaded texture can 
be written to a status area. By simply comparing the IDs when the next TMEM loading operation is 
performed, the loading question can be resolved rather easily. 

The loading decision method used by S2DEX is an extension of the above concept. When partial 
loading by dividing TMEM is performed, S2DEX can also make loading decisions for different parts of 
TMEM using two 32 bit variables (flag and mask); this makes partial loading possible. 

The RSP provides four 32 bit status variables in the status region. When the microcode starts up, these 
vanables are set to 0. sid will determine which status value to use. sid can assign one of the values 
{0,4,8,12}. 
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g[s]objLoadTxtr actually makes the loading decision using the steps below. 

1. Check the condition of (status [sid] & mask == flag). 

2. If the result is true, assume that the texture is already loaded and terminate the loading 
operation. 

3. If the result is false, load the texture, and change Status[sid] to: 

Status [sid] = (Status [sid] & -mask) | (flag & mask); 

The easiest way to use flag is to assign -1 (=Oxffffffff) to mask, and texture's source data address (= the 
value of the member variable "image") to flag. If there is no texture data starting from the same 
address, this will act as a texture cache. 

Also, when (flag & -mask) != 0, the condition will always be false, and texture will always be 
loaded. 

The next example will divide TMEM into two areas and control each area. Here, assign Status[0]'s bits 
31-16 to the first half of TMEM, and assign bits 15-0 to the last half of TMEM. Assign the sequence 
number to each texture. The value of sid is always 0. 

Load Area flag mask 

A: texture 1 to 2S5 0x00010000 OxffffOOOO 

B: texture 2 256 to 511 0x00000002 OxOOOOffff 

C; texture 3 to 511 0x00030003 Oxffffffff 

D: texture 3 only the last half 256 to 511 0x00000003 OxOOOOffff 

At C, the entire texture 3 is loaded. Even though the loading operation of A changes the first half, since 
the TMEM's last half retains texture 3 data. The request for loading texture 3 at D to the last half will not 
require actual loading. 

Similar to this example, S2DEX has GBI gSPSelectDL / gSPSelectBranchDL, which performs a DL 
branching operation, using the same principle as the operation using status. 

The member variables of other the structures are explained in the following paragraphs. 
1. Texture load using Load Block (uObjTxtrBlock_t structure) 

tmem 

The texture's loading destination TMEM address is assigned to tmem in DoubieWord units. Normally, 
this loading address is used as the value of imageAdrs of uObj sprite structure. If this value is to be 
specified in pixel units, the macro GS_PIX2TMEM( ) , described earlier, will become useful. 

tsize 

The size information of the texture to be loaded is assigned to tsize. To obtain this value from texture 
size, the macro gb_tb_tsize ( ) is used. 

GS_TB_TSIZE (pix, siz) : tsize setting 

pix: the number of texels to be loaded (=width of texture X height of 
texture) 
siz: 1 texel size, specify G_IM__S I Z_* 

tline 

The width information of the texture to be loaded is assigned to tline. Use the macro 
gs_tb_tline ( ) for obtaining the value from the texture width. 

GS_TB_TLINE(pix,siz) : setting of tline 

pix: the number of texel of texture width 

siz: 1 texel size, specified by G IM SIZ * 
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2. Texture load by LoadTile (uObjTxtrTile_t structure) 

tmem 

This member variable is common to the load operations using LoadBlock. The TMEM texture load 
destination address is assigned to tmem in DoubleWord units. 

twidth 

The load texture width information is assigned to twidth. Use the macro gs_tt_twidth ( ) to obtain 
the value from texture width. 

GS_TT_TWIDTH{pix,siz) : setting of twidth 

pix: texture width 

siz: 1 texel size specified by G_IM_SIZ_* 

theight 

The height information of the texture to be loaded is assigned to theight. Use the macro 
gs_tt_theight ( ) to obtain the value from texture height. 

GS_TT_THEIGHT (pix,siz) : setting of theight 

pix: texture height 

siz: 1 texel size, specified by G_IM_SIZ_* 

3. TLUT load (uObjTLUT_t structure) 

phead 

The first TLUT area number is assigned to phead. The palette number can be obtained by adding 256 
to the normal palette ID. Therefore, the vaiue ranges from 256 to 51 1. Use the gs_pal_head ( ) macro 
for this setting. 

GS_PAL_HEAD (head) : setting of phead (add 256 to head) 

head: first ID of TLUT to be loaded 

pnum 

A vaiue representing "(the-number-of-coiors-of-the-loaded-TLUT) -1" is assigned to pnum. Use the 
gs_pal_num ( ) macro for this setting. 

GS_PAL_NUM(num) : setting of pnum (num -1) 
num: the number of TLUT to be loaded 

zero 

This member is not used in uobjTLUT_t. However, to maintain compatibility with other structures, 
always assign to zero. 

The following illustrates an example of the set-up for the three structures. 
1. RGBA16 Texture load using LoadBlock 

uObjTxtr objTxtrBlock_RGBA16 = { 

G_OBJLT_TXTRBLOCK, 

(u64 *)textureRGBA16, 
GS_PIX2TMEM(0, G_IM_SIZ_16b) , 
GS_TB_TS I ZE ( 3 2 * 3 2 , G_IM_S I Z_l 6b ) , 
GS_T B_TLINE (32, G_IM_S I Z_l 6b ) , 
0, 

(u32) textureRGBA16, 
-1 

}; 



/* 


type 


*/ 


/* 


image 


*/ 


/* 


tmem 


*/ 


/* 


tsize 


*/ 


/* 


tline 


*/ 


/* 


sid 


*/ 


/* 


flag 


*/ 


/* 


mask 


*/ 
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/* 


type 


*/ 


/- 


image 


V 


/* 


tmem 


*/ 


/* 


twidth 


*/ 


/* 


theight 


*/ 


/- 


sid 


*/ 


/* 


flag 


*/ 


/* 


mask 


*/ 



/* 


type 


*/ 


/* 


image 


*/ 


/* 


phead 


V 


/* 


pnum 


*/ 


/* 


zero 


*/ 


/* 


sid 


*/ 


/* 


flag 


*/ 


/* 


mask 


*/ 



2. CI4 Texture load using LoadTile 

uObjTxtr objTxrrTile_CI4 = { 

G_OB JLT_TXTRTILE , 

(u64 *)textureCI4, 

GS_PIX2TMEM {0, G_IM_SIZ_4b) , 

GS_TT_TWI DTH (32, G_IM_S I Z_4b) , 

GS_TT_THE I GKT (32, G_IM_S I Z_4b ) , 

0, 

(u32)textureCI4, 

-1 
}; 

3. TLUTIoad 

UObjTxtr objTLUT_Cl4 = { 

G_OBJLT_TLUT, 

(u64 *) textured 4pal, 

GS_PAI_HEAD ( ) , 

GS_PA1_NUM(16) , 

0, 

0, 

(u32) textureCI4pal, 

-1 
}; 

Compound Processing GBI 

in actual game development, combining the Texture Load GBI and the sprite Drawing GBI is sometimes 
advantageous for controlling Sprites. S2DEX provides the mechanism to control the two GBIs with one 
GBI. The following is an explanation of compound processing of the GBIs. 

uobjTxsprite Structure 

uObjTxsprite structure, which is shown below, has been constructed by combining uObjTxtr 
structure and uobj sprite structure. The pointer to uobjTxsprite structure is provided to the 
compound processing GBI as the parameter. 

typedef struct { 

uObjTxtr txtr; 

uObj Sprite sprite; 
} uObjTxsprite; /* 48 bytes */ 

gSPObjLoadTxRect 

gSPObjLoadTxRect (Gfx *gdl, uObjTxSprite *txsp) 
gsSPObjLoadTxRect (uObjTxSprite *txsp) 

Gfx *gdl; display list pointer 

UObjTxSprite *txsp; The pointer to texture load and sprite draw data structure 

The g[s] SPObjLoadTxRect GBI performs the Texture Load operation, and then draws a non-rotating 
sprite. 

Essentially, this command performs two GBI operations g[s] spobjLoadTxtr and 
g[s]SPObj Rectangle with one GBI. The results of (A) and (B) shown below are identical. 

(A) gsSPObjLoadTxRect (txsp) ; 

(B) gsSPObjLoadTxtr(&{txsp->txtr) ) ; 
gsSPObj Rectangle (& (txsp->sprite) ) ; 
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gSPObjLoadTxRectR 

gSPObjLoadTxRectR(Gfx *gdl, uObjTxSprite *txsp) 
gsSPObjLoadTxRectR{uObjTxSprite *txsp) 

Gfx *gdl; The display list pointer 

uObjTxSprite *txsp; The pointer to the texture load and the sprite drawing 
data 

structure 

The g[s]SPObjLoadTxRectR GBI performs the Texture Load operation, and then draws a non-rotating 
sprite referencing a 2D matrix. 

Essentially, this command performs two GBI operations, g[s] SPObjLoadTxtr and 

g[s] spobjRectangleR with one GBI. The results of (A) and (B) shown below are identical. 

(A) gsSPObjLoadTxRectR(txsp) ; 

(B) gsSPObjLoadTxtr (& (txsp->txtr) ) ; 
gsSPObjRectangleR (& (txsp->sprite) ) / 

gSPObjLoadTxSprite 

gSPObjLoadTxSprite (Gfx *gdl, uObjTxSprite *txsp) 
gsSPObjLoadTxSprite (uObjTxSprite *txsp) 

Gfx *gdl; The display list pointer 

UObjTxSprite *txsp; The pointer to the texture load and the sprite drawing 
data 

structure 

The g[s] spobjLoadTxSprite GBI performs the Texture Load operation, and then draws a rotating 
sprite. 

Essentially, this command performs two GBI operations, g[s] SPObjLoadTxtr and g[s] spobj Sprite 
with one GBI. The results of (A) and (B) shown below are identical. 

(A) gsSPObjLoadTxSprite (txsp) ; 

(B) gsSPObjLoadTxtr (& (txsp->txtr) ) ; 
gsSPObj Sprite (& (txsp->sprite) ) ; 

Conditional Branching GBI 

We have explained that S2DEX is using the RSP's Status for making a loading decision. Here, we will 
explain the GBI which uses Status for DL branching and linking. 

gSPSetStatus 

gSPSetStatus {Gfx *gdl, u8 sid, u32 val) 
gsSPSetStatus (u8 sid, u32 val) 

Gfx *gdl; display list pointer 

u8 sid; Status ID { 0, 4, 8, or 12 } 

u32 val; A value the user desires to set 

g[s]spsetstatus assigns the value of val to the Status area (status [sid]) specified by sid. The 
Status value is referenced for Texture Loading and making conditional branching decisions. 



30 



S2DEX GBIs 



gSPSelectDL 

gSPSelectDL(Gfx *gdl, Gfx *ldl, u8 sid, u32 flag, u32 mask) 
gsSPSelectDL(Gfx *ldl, u8 sid, u32 flag, u32 mask) 

Gfx *gdl; display list poinrer 

Gfx *ldl; display list to be linked 

u8 sid; Status ID { 0, 4, 8, or 12 } 

u32 flag; Status flag 

u32 mask; Status mask 

g[s]spSelectDL inspects status [sid] using the same method used for texture load decision 
making. Depending on the True/False result, other display lists are called. 

g [ s ] SPSelectDL determines whether or not to call the display list by going through the following steps. 

• Check the condition of (Status [sid] & mask) -~ flag 

• If the result is true, finish GBI without doing anything. 

• if the result is false, change the status [sid] by performing: 

Status [sid] = (Status [sid] & ~mask) ! (flag & mask); 

and call display list "ldl". 

gSPSelectBranchDL 

gs P Select Br anchDL (Gfx *gdl, Gfx *bdl, u8 sid, u32 flag, u32 mask) 
gsSPSelectBranchDL(Gfx *bdl, u8 sid, u32 flag, u32 mask) 

Gfx *gdl; display list pointer 

Gfx *link; display list to be linked 

u8 sid; Status ID { 0, 4, 8, or 12 } 

u32 flag; Status flag 

u32 mask; Status mask 

g[s] spseiectBranchDL examines status [sid] using the same method used for texture load 
decision making, and depending on the True/False result branches out to other display lists. 

g[s] spseiectBranchDL determines whether or not to call the display list using the following steps. 

• Checkthe condition of (Status [sid] & mask) == flag 

• If the result is true, finish GBI without doing anything. 

• if the result is false, change the status [sid] by performing: 

Status [sid] ~ (Status [sid] & -mask) | (flag & mask); 

and branch out to display list "ldl". 
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Chapter 4 Emulation Functions 

These are functions for using the CPU to emulate S2DEX GBI functions. 

guS2DEmuGBgRectlCyc 

void guS2DEmuGBgRectlCyc (Gfx **gdl_p, uObjBg *bg) / 

This function uses the CPU to emulate the action of S2DEX function gSPBgRectlcyc by combining 
other GBI's. 

Parameters; gdl_p Pointer to pointer to display list 

* The value for gdl_j? is automatically calculated. 

bg Pointer to uObjBg structure 

Calling gSPBgRectlCyc (gdl ++, bg) can be replaced by guS2DEmuGBgRectlCyc Ugdl, bg). 
Refer to "gSPBgRectlcyc" on page 13 for an explanation of the parameter bg. 

In addition, in order to notify the main routine that a scissoring box setting and Texture Filter setting, the 

function guS2DEmuSetScissor, discussed below, must be Called before guS2DEmuBgRectlCyc. 

This function produces GBI's which are functional not only in S2DEX, but in the F3DEX series as well. 
Because of this, one microcode can be processed when displaying a scaled scrolling BG screen and a 
3D model at the same time. 

guS2D£mtiSet Scissor 
void guS2DEmuSetScissor {u32 ulx, u32 uly, u32 lrx, u32 lry, u8 bilerp) / 

This function sets the scissoring parameters and Texture Filter referred when the function 
guS2DEmuBgRectlCyc IS processed. 

Parameters: ulx upper left X coordinate of scissor box (ulO.O) 
uly upper left Y coordinate of scissor box (ulO.O) 
lrx lower right X coordinate of scissor box (ulO.O) 
lry lower right Y coordinate of scissor box (ulO.O) 
bilerp set to value other than to perform Bilerp interpolation 
processing on the image, or set to for PointSample. 

Normally, the range of the scissor box set by g [ s ] DPSetscissor is handled by this function as 
parameters. In addition, the initial values for ulx, uly, lrx, lry, and bilerp are 0, 0, 320, 340, 
0, respectively, which are settings that draw to a 320x240 pixel frame buffer with PointSample. 

This function only needs to be called once before guS2DEmuBgRectlcyc is called. As long as there is 
no change in the scissor box and Texture Filter, it only needs to be called once during game initialization, 
and doesn't need to be called every time a frame is drawn. 
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Chapter 5 DEBUG Information Output Function 

There are 2 versions of S2DEX Microcode; one version for debugging and another version for release. 
The relationship between the two microcodes is the same as the relationship between libultra_rom.a and 
libultra_d.a. 

Although the debug version microcode, S2DEX_d is slower than the release version of the microcode, it 
has the following additional features. 

• Outputs the display list processing log. 

• In the event of bad input or encountering undefined commands, stops RSP and reports the 
problem to the CPU. 

Investigation of problems, such as finding the cause of a runaway RSP, wiil become easier by checking 
the display list processing log. 

To use S2dex_d, it is necessary to prepare an output buffer for the RSP display list processing log. The 
size must be the same as the display list, and must be 8 byte aligned. 

Once the area is reserved, provide the pointer data of the first address of the area to data__size, which 
is a member variable of the osTask structure. This member variable is not used in the S2DEX and 
F3DEX series to mean the size of the DL is the essential meaning. A remnant of N64 OS/Library 
Version 1 ,0, it is used as a log output buffer. 

This address must not be the Segment address. When gspS2DEX. fifo_d. o activates as microcode, it 
is stored in the address specified by the process log. 

For details concerning the processing log's display methods, please refer to the function 
ucDebugGfxLogPrint ( ) in the sample program uc_assert . c. Also, for details concerning the 
decision making process for stopping the RSP, please refer to ucCheckAssert { ) in the same file. 
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Chapter 6 Installation of S2DEX Package 

The description here applies to S2DEX Microcode when it is received as a patch. If the package is 
included in the N64 OS/Library that you received, the work described here is not necessary. 

S2DEX Microcode consists of the following files: 

gspS2DEX. fif o . o S2DEX Microcode 

gspS2DEX . f if o_d . o S2DEX Microcode (for Debugging) 

include/ gs 2dex . h Include files for S2DEX 

libuitra/Makef ile Makefile for updating libultra 

libultra/us2dex. o initialization routine for BG structure 

libuitra/us2dex__emu. o Scaieabie BG drawing routine 

sample/* S2DEX Sample programs 

libultra* . a are created by executing the make command in the libultra directory. Copy 
libultra*. a files to /usr/lib. Also, copy gspS2DEX.fifo.o and gspS2DEX.fifo_d.o to 
/usr/lib/PR, and copy include/ gs2dex.h to /usr/include/PR. 

In addition, perl is necessary to compile affiliated sample programs. Please install the following 
packages from the IRIX 5.3/6.X CD. 

For IRIX 5.3: 

eoe2.sw.gifts_perl 

For IRIX 6.x: 

eoe2.sw.gifts_perl 
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